Professional Documents
Culture Documents
Cloud IEEE
Cloud IEEE
SECURITY
Practical Use of
Microservices 6
Economics of
Microservices 16
SEPTEMBER/OCTOBER 2016
www.computer.org/cloudcomputing
Experience the Newest and Most Advanced
Thinking in Big Data Analytics
www.computer.org/bda
Now theres
even more to
love about your
membership...
NO
ADDITIONAL
FEE
ON
ON YOUR COMPUTER ON YOUR eREADER
ON YOUR SMARTPHONE ON YOUR TABLET
Introducing myCS, the digital magazine portal from IEEE Computer Society.
FinallyGo beyond static, hard-to-read PDFs. Our go-to portal makes it easy to access and customize your
favorite technical publications like Computer, IEEE Software, IEEE Security and Privacy, and more. Get started
today for state-of-the-art industry news and a fully adaptive experience.
www.computer.org/epub
Editor in Chief
Mazin Yousif, T-Systems International, mazin@computer.org
Editorial Board
Pascal Bouvry, University of Luxembourg David Linthicum, Cloud Technology Partners
Ivona Brandic, Vienna University of Technology Christine Miyachi, Xerox Corporation
Christopher Crin, University of Paris 13 Omer Rana, Cardiff University
Kim-Kwang Raymond Choo, University Rajiv Ranjan, Newcastle University
of Texas at San Antonio Lutz Schubert, Ulm University
Beniamino Di Martino, Second University of Naples Alan Sill, Texas Tech University
Mianxiong Dong, Muroran Institute of Technology Zahir Tari, RMIT University
Keith G. Jeffery, Keith G. Jeffery Consultants Joe Weinman
and Cardiff University Yongwei Wu, Tsinghua University
Steering Committee
Sherman Shen, University of Waterloo (chair, Hui Lei, IBM
Communications Society liaison) V.O.K. Li, University of Hong Kong
Kirsten Ferguson-Boucher, Aberystwyth University (Communications Society liaison)
Raouf Boutaba, University of Waterloo Rolf Oppliger, eSecurity Technologies
(Communications Society Liaison) Manish Parashar, Rutgers, the State University of New Jersey
Carl Landwehr, NSF, IARPA (EIC Emeritus IEEE S&P)
IEEE Cloud Computing (ISSN 2325-6095) is published bimonthly by the IEEE Subscription rates: IEEE Computer Society members get the lowest rate of US$39
Computer Society. IEEE headquarters: Three Park Ave., 17th Floor, New York, NY per year. Go to www.computer.org/subscribe to order and for more information on
10016-5997. IEEE Computer Society Publications Office: 10662 Los Vaqueros Cir., Los other subscription prices.
Alamitos, CA 90720; +1 714 821 8380; fax +1 714 821 4010. IEEE Computer Society
headquarters: 2001 L St., Ste. 700, Washington, DC 20036.
Initial state Data loss check
22
Elastic
action
S
1-y
v v
:vs :ws $:ms
CONTENT
What will the future of cloud computing look like? What are some of the issues
professionals, practitioners, and researchers need to address when utilizing cloud
services? IEEE Cloud Computing magazine serves as a forum for the constantly
shifting cloud landscape, bringing you original research, best practices, in-depth
analysis, and timely columns from luminaries in the field.
THEmE ARTIClEs
26 54
for example (dependencies, webs
Public repositories for example)
S
1-y 1-z
Code docker build Code
github hook
v v
Dockerfile
:vs :ws $:ms
Images repositories
Private repositories
September/October 2016
Private reposito
Volume 3, Issue 5
Alternate
Public repositories registry
www.computer.org/cloudcomputing
Third-party
Public repositor
Docker repositories
hub
Official repositories
(b)
docker pull Image docker pull Image
docker hook
Development
environment
Production
environment
Columns
Docker Orchestra
(Kubernet
4 From the Editor in Chief host
76 Standards
Cont. Cont.Now
Cont. for examp
Microservices The Design and Architecture Tasks
Mazin Yousif Docker daemon
of Microservices Commands
Service
Alan Sill Host libraries
Reuse Rights and Reprint Permissions: Educational or personal use of this material is permitted without fee, provided such use: 1) is not made for profit; 2)
includes this notice and a full citation to the original work on the first page of the copy; and 3) does not imply IEEE endorsement of any third-party products
or services. Authors and their companies are permitted to post the accepted version of their IEEE-copyrighted material on their own Web servers without
permission, provided that the IEEE copyright notice and a full citation to the origin al work appear on the first screen of the posted copy. An accepted manu-
script is a version which has been revised by the author to incorporate review suggestions, but not the published version with copyediting, proofreading and
formatting added by IEEE. For more information, please go to: http://www.ieee.org/publications_standards/publications/rights/paperversionpolicy.html.
Permission to reprint/republish this material for commercial, advertising, or promotional purposes or for creating new collective works for resale or redistribu-
tion must be obtained from the IEEE by writing to the IEEE Intellectual Property Rights Office, 445 Hoes Lane, Piscataway, NJ 08854-4141 or pubs-permissions
@ieee.org. Copyright 2016 IEEE. All rights reserved.
Abstracting and Library Use: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy for private use of patrons, provided the
per-copy fee indicated in the code at the bottom of the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
IEEE prohibits discrimination, harassment, and bullying. For more information, visit www.ieee.org/web/aboutus/whatis/policies/p9-26.html.
From the EditoR in Chief
Microservices (or modular architectures in Lydia Chen, Chang Liu, and Massimo Villari look
general) are better suited for the many complex ap- at scheduling and efficient resource management for
plications were building these days. This includes microservices. Finally, in Standards Now, Alan Sill
enterprise applications (that is, confined within the explores how microservices exploit both modern and
enterprise) as well as Web-scale applications, where historical standards and looks at the future of mi-
companies need to scale to reach consumers world- croservices development.
wide. Microservices, specifically, work well for new One last item of news is the addition of Chris-
types of applications such as the Internet of Things, tine Miyachi to IEEE Cloud Computings edito-
where single-function sensors and actuators are de- rial board (see the sidebar for a brief biography).
ployed in the field. She currently chairs the IEEE Special Technical
Given the complexities of our business environ- Community of Cloud Computing and has worked
ments, technologys role in the social fabric, and the diligently to expand its reach to more than 10,000
flat-world view of things, Im encouraged that tech- subscribersan outstanding accomplishment!
nological evolutions have brought us microservices,
containers, and DevOps.
The columns in this issue address various top- Mazin Yousif is the editor in chief of IEEE Cloud
ics related to microservices. As mentioned earlier, Computing. Hes the chief technology officer and vice
David Linthicum provides a generic overview of president of architecture for the Royal Dutch Shell
microservices and relates them to containers and Global account at T-Systems International. Yousif has
DevOps. Cloud Economics guest author Andy Sin- a PhD in computer engineering from Pennsylvania
gleton addresses the costs and benefits associated State University. Contact him at mazin@computer
with microservices. In the Cloud and the Law col- .org.
umn, Christian Esposito, Aniello Castiglione, and
Kim-Kwang Raymond Choo look at the possible se-
curity challenges around microservices and related Read your subscriptions through
mitigation topics requiring more research. In Blue the myCS publications portal at
http://mycs.computer.org.
Skies, Maria Fazio, Antonio Celesti, Rajiv Ranjan,
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 5
Cloud Tidbits
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 7
Cloud Tidbits
own domain, which will be another container thats tributed components that function together to form
accessed using data-oriented microservices. the applications, and are also separately scaled. For
Pattern three splurges on testing. Although instance, the container that manages the user inter-
many will point to the stability of containers as a face can be replicated across servers as the demand
way around black-box and white-box testing, the for that container goes up when users log on in the
application now exists in a new architecture with morning. This provides a handy way for cloud opera-
new service dependencies. There could be a lot tions to build autoscaling features around the ap-
of debugging that has to occur up front, before plication, to expand and de-expand the use of cloud
deployment. resources as needs change.
There are other sides to this as well. Lori Most enterprises believe the cloud will become
MacVittie, one of my advisory board members, the new home for applications. However, not all
noted in an email that containers and microservices applications are fit for the cloudat least, not yet.
seem to mix many tangentially related topics Care must be taken to select the right applications
together, but microservices have nothing to do with to make the move.
The use of containers and
microservices makes things easier.
This approach forces the application
developer charged with refactoring
the application to think about how
When applications are put into to best redesign the applications to
production, those charged with cloud become containerized and service
oriented. In essence, youre taking a
operations should take advantage of monolithic application and turning it
the container architecture. into something thats more complex
and distributed. However, it should
also be more productive, agile, and cost
effective. Thats the real objective here.
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 9
Cloud and the Law
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 11
Cloud and the Law
Mitigation References
Conventional solutions such as secure message- 1. M. Fowler, Microservices Resource Guide,
passing middleware, physical security for cloud 2016; http://martinfowler.com/microservices.
infrastructure, and privacy-preserving cloud data 2. A. Balalaie, A. Heydarnoori, and P. Jamshidi,
storage schemes might only be effective for one or Microservices Architecture Enables DevOps,
more components, but theyre unlikely to address IEEE Software, vol. 33, no. 3, 2016, pp. 4252.
the security (and certainly not the legal) challenges 3. S. Newman, Building Microservices: Designing
due to microservices unique characteristics. Fine-Grained Systems, OReilly Media, 2015.
Potential research to address these technical 4. T. Killalea, The Hidden Dividends of Microser-
limitations or security vulnerabilities can focus on vices, Comm. ACM, vol. 59, no. 8, 2016, pp.
several areas. One area is microservice validation, 4245.
which could be conducted both in isolation and in 5. F. Oliveira et al., Delivering Software with Agil-
composition with other microservices, allowing us ity and Quality in a Cloud Environment, IBM
to identify and mitigate potential vulnerabilities in J. Research and Development, vol. 60, nos. 23,
the software, implementation, or interaction be- 2016, pp. 10:1-10:11.
tween microservices. 6. M. Villamizar et al., Infrastructure Cost Com-
In addition, we need a dynamic, and preferably parison of Running Web Applications in the
lightweight, security monitoring and management Cloud Using AWS Lambda and Monolithic and
system, responsible for monitoring and enforcing Microservice Architectures, Proc. 16th IEEE/
correct behavior of microservices and other related ACM Intl Symp. Cluster, Cloud and Grid Com-
activities (such as cooperation). Once the behav- puting (CCGrid), 2016, pp. 179182.
ior or activities deviate from the norm, the system 7. C.H. Costa et al., Sharding by Hash Partition-
should undertake corrective actions and/or coun- ing: A Database Scalability Pattern to Achieve
termeasures to bring the application to the correct Evenly Sharded Database Clusters, Proc. 17th
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 13
CLOUD AND THE LAW
Intl Conf. Enterprise Information Systems Proc. 14th Intl Symp. Software Reliability Eng.
(ICEIS), 2015. (ISSRE), 2003, pp. 154165.
8. C. Esposito, D. Cotroneo, and S. Russo, On Re- 13. D.I. Savchenko, G.I. Radchenko, and O. Taipale,
liability in Publish/Subscribe Services, Comput- Microservices Validation: Mjolnirr Platform
er Networks, vol. 57, no. 5, 2013, pp. 13181343. Case Study, Proc. 38th Intl Convention Infor-
9. N. Dragoni et al., Microservices: Yesterday, mation and Comm. Technology, Electronics and
Today, and Tomorrow, 2016; https://arxiv.org/ Microelectronics (MIPRO), 2015, pp. 235240.
abs/1606.04036. 14. C. Esposito and M. Ciampi, On Security in Pub-
10. Y. Sun, S. Nanda, and T. Jaeger, Security-as-a- lish/Subscribe Services: A Survey, IEEE Comm.
Service for Microservices-Based Cloud Applica- Surveys and Tutorials, vol. 17, no. 2, 2015, pp.
tions, Proc. 7th IEEE Intl Conf. Cloud Comput- 966997.
ing Technology and Science (CloudCom), 2015,
pp. 5057.
11. F. Callegati et al., Data Security Issues in MaaS- christian esposito is adjunct professor at the
Enabling Platforms, Proc. Intl Forum Research University of Naples Federico II, Italy, and research
and Technologies for Society and Industry, 2016; fellow and adjunct professor at the University of Saler-
https://hal.inria.fr/hal-01336700. no, Italy. His research interests include information
12. S. Kim et al., High-Assurance Synthesis of security and reliability, middleware, and distributed
Security Services from Basic Microservices, systems. Esposito has a PhD in computer engineer-
ing from the University of Naples Federico II, Italy.
Contact him at christian.esposito@dia.unisa.it.
Special Issue on
Multicloud
Submission deadline: 2 January 2017 Publication date: July/August 2017
A
s Cloud Computing evolved to a widely used cloud federations,
computing as a service model, limitations and intrinsic scheduling and load balancing,
characteristics of monolithic cloud provider offerings hybrid clouds,
emerged. Moreover, specialized computing power such as
autonomic management,
clusters, GPUs, solid state storage, and specific applications
multicloud and the Internet of Things,
at different service levels can now be acquired as services
from different providers. The use of a combination of cloud QoS and QoE,
services from various providers can be performed to contour economic and business models,
limitations of a single provider and enhance application cross-service-level management (IaaS, PaaS, SaaS,
execution by gathering together the necessary specific, on and XaaS),
demand resources for a wide range of applications. incentive mechanisms, and
multiclouds and green computing.
This IEEE Cloud Computing Magazine Special Issue on
Multicloud aims to cover all aspects of connecting multiple
clouds to allow automatic, transparent, and on demand Guest Editors
application execution that takes advantage from the synergy
Dr. Luiz F. Bittencourt, University of Campinas
among resources of different providers. For this synergy to
Dr. Rodrigo N. Calheiros, University of Melbourne
become effective and efficient, connecting different providers
across their boundaries brings new, challenging efforts. Dr. Craig A. Lee, Aerospace Corporation
Multicloud deployment must solve challenges that include
resource management and scheduling, identity management, Submission Information
trust and security issues, business models, and incentive
mechanisms in multicloud environments. We invite authors to Submissions should be 3,000 to 5,000 words long, with a
submit outstanding and original manuscripts on the following maximum of 15 references, and should follow the magazines
topics within the context of multiclouds: guidelines on style and presentation (see https://www.computer
.org/web/peer-review/magazines for full author guidelines). All
brokering mechanisms, submissions will be subject to single-blind, anonymous review
resource discovery and management, in accordance with normal practice for scientific publications.
security and privacy, For more information, contact the guest editors at cc4-2017
authentication and authorization, @computer.org.
applications and case studies, Authors should not assume that the audience will have
auditing and accounting, specialized experience in a particular subfield. All accepted
multicloud APIs, articles will be edited according to the IEEE Computer Society
monitoring, style guide (www.computer.org/web/publications/styleguide).
data management, Submit your papers through Manuscript Central at https://
performance modeling and evaluation, mc.manuscriptcentral.com/ccm-cs.
www.computer.org/cloudcomputing
Cloud Economics
Costs of Microservices
Although microservices approaches of-
fer substantial benefits, a microservices I currently believe that if you have
architecture requires extra machinery, fewer than about 60 people working
which can impose substantial costs. It
also often requires extra code to com-
on your system, you dont need a
municate between services. Instead of microservices architecture.
making simple function calls, youll de-
fine API calls or messages, and imple-
ment API calls on each end.
In addition, you need systems
such as service catalogs and messaging and queu- more monolithic system. If you have a large system
ing services to discover and then route calls to the that changes frequently and does need to scale, you
correct service instances. When you make a call to benefit from a microservices architecture through
a microservice, youll go through a proxy or mes- several mechanisms. Its much easier to test and
saging layer that finds a suitable instance. Because release the smaller components. You get greater
microservices can be event-processing scripts, con- reliability through redundancy and scalability be-
tainers, or entire virtual machines, youll also want a cause of your ability to increase the instances of any
systematic way to package and deploy them. service thats a bottleneck. You get greater quality
A microservices architecture must include sys- through reuse of field-proven components packaged
tems to monitor service performance and behavior into microservices.
as well as special techniques to handle errors. When Although there are, no doubt, exceptions, I
a microservice isnt responding, there is no simple currently believe that if you have fewer than about
way for other services to understand the error or 60 people working on your system, you dont need
even see there is a problem. Youll need extra code a microservices architecture. Over this amount of
and monitoring to make sure problems get handled product complexity, youll probably benefit from a
as errors, rather than just piling up or cascading into microservices approach.
a catastrophe. A high volume or complex system will inevitably
Finally, youll need to have new discussions move to Web services and then to smaller microser-
about when to include functions inside one service, vices. Theres a limit to the transaction volume and
and when to break them into separate services. throughput that a single server can handle. Beyond
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 17
Cloud Economics
that point, youll have multiple servers, and routing merges. Each team can run an integration test on its
and load-balancing machinery typical of a service code, and release it directly as a packaged service.
architecture. Theres also a limit to the number of A microservices architecture provides a simple
functions that can be tested and maintained on that way to change components without spreading prob-
server. Beyond that point, youll have multiple ser- lems throughout the system. If you make a signifi-
vices. Most modern business systems are well be- cant change to a component, youll want to know
yond these two points. that this doesnt cause problems in the consumers
that use the component. You dont want to test and
Release Frequency andAgility fix every consumer. Consumers will be reliable if
Monolithic applications include a lot of functions they have access to a stable API. So, microservices
that need to be tested, so they take a long time to maintain their old APIs and behavior, even while
test and releasesometimes a month or more. A they provide new features in new API calls. In fact,
monolithic architecture works well for an installed this is one way to determine how to right-size a ser-
software product that requires several months to be vice: making it small and simple enough that the
distributed, tested, and installed at customer sites. API can remain stable.
In contrast, a microservices-centric develop- Microservices approaches typically utilize auto-
ment team can test and release changes to smaller mated test scripts that run in a continuous integra-
tion system. Such scripts help ensure
that localized changes that shouldnt
impact the larger system in fact dont.
Continuous integration typically ex-
A microservices architecture ploits a layer of automated testing thats
provides a simple way to change more extensive, reliable, and efficient
than the tests that run on the output of
components without spreading a larger application. This drives down
problems throughout the system. the time and cost of testing.
All of these techniques add up to
the practice of continuous delivery, as
practiced by SaaS and online service
companies. Systems are chopped up
components more than once per day. A company like into an array of microservices. Each microservice is
Amazon with thousands of services can make more assigned to a development team, which monitors it,
than a thousand changes per day, fixing problems fixes it, and releases improvements whenever theyre
and adding new features. This type of continuous readywhether once a month, once a week, once a
delivery becomes a powerful tool for developers that day, or even more often. They maintain their APIs
update cloud-based software-as-a-service (SaaS) and and feed their services into a continuous integration
online systems, which in turn means that continu- system to make sure that the whole system works
ous delivery is also a powerful tool for business agil- correctly. This continuous process is more adaptable
ity and competitive advantage. and easier to manage than the older Scrum-style ag-
Large software teams often have problems with ile development with its two-week cadence.
merging code, in addition to testing. We often see Youll benefit from a microservices architecture
small teams of one to seven programmers making if you run continuous delivery of online services, if
changes and then merging these changes with the you do a lot of work merging code, or if you have
work of other teams. Any conflicts in the changes long test cycles or high test expenses.
from other teams need to be studied and fixed by
hand. If many groups are merging, the process be- ComponentSize
comes difficult and unreliable. The microservices Bigger components (monolithic applications and
architecture solves this problem by skipping the macroservices) are easier to operate and have less
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 19
Cloud Economics
incoming events and uses compute resources only based microservices architecture if youre dealing
when such events or API calls arrive. This has ad- with any of the following types of complexity:
vantages of simplicity and efficiency, and it can re-
duce compute costs by 90 percent or more. large software systems with large numbers of
developers or long and expensive test cycles,
DataSize, Consistency, and Complexity a competitive environment that requires the
When your database is small, you can provide it rapid upgrading and release of online systems or
to all parts of a big application, and a monolithic business services,
architecture is practical. In the 1980s, data archi- multiple software-based products or online
tects learned to normalize databases, so that each services,
piece of data was in one place, and they were sure migration from building and maintaining sys-
that the data was correct and not contradicted else- tems to buying more components that will be
where. This seems important for data like a bank continuously upgraded by vendors,
account balance. However, this central data store integration with systems on different platforms,
propagates problems when you change the data high volume of usage on cloud-based platforms,
structure, and you need to test all of the code that or
uses the data. And, as data volumes increase, it be- large flow of data, or rapidly changing data
comes difficult or even impossible to put the chang- structures.
es in one place. So, the central shared database has
become outdated. Your services will become cells in the API economy
A monolithic architecture is often based on a re- of the cloud, the largest and most powerful comput-
liable transaction approach typically referred to as er system ever conceived.
ACIDatomic (all or nothing), consistent, isolated
(transactions lead to identical outcomes whether ex- References
ecuted in parallel or serially), and durable (perma- 1. C. Jones, Software Assessments, Benchmarks, and
nent, even in the face of a disaster). It has proven Best Practices, Addison-Wesley, 2000.
to be impossible to run ACID systems at very large 2. F.P. Brooks, Jr., The Mythical Man-Month: Es-
scale. Large systems must live in a BASE world of says on Software Engineering, Addison-Wesley,
basic availability, soft state, and eventual consisten- 1975.
cy.5 In the BASE world, we often get data through 3. R. Potvin, The Motivation for a Monolithic
API calls to microservices.6 Codebase: Why Google Stores Billions of Lines
When the database is very big, or data structure of Code in a Single Repository, presentation,
changes frequently, or data flows at high volume, @Scale conference, 2015; www.youtube.com/
youll want a service architecture in whichdata is en- watch?v=W71BTkUbdqE.
capsulated into services, and other parts of the app 4. J. Weinman, Cloudonomics: The Business Value
get it with API calls. In addition, the structure of the of Cloud Computing, John Wiley & Sons, 2012.
API calls should be stable and backward compatible, 5. D. Pritchett, BASE: An ACID Alternative,
even when the underlying data or schema changes. ACM Queue, vol. 6, no. 3, 2008, pp. 4855.
The architecture should also include an expandable 6. E.A. Brewer, Towards Robust Distributed Sys-
number of data handling nodes. Finally, the service tems, Principles of Distributed Computing,
architecture should allow data to be replicated in an 2000; https://people.eecs.berkeley.edu/~brewer/
eventually consistent way to all nodes. cs262b-2004/PODC-keynote.pdf.
I
EEE Cloud Computing magazine seeks accessible, useful Comparing applications one cloud-native and the
papers for a special issue on Cloud-Native Applications other not in terms of performance, security, reliability,
and Architecture. Many applications in enterprises are maintainability, scalability, etc.;
not able to leverage the advantages of cloud computing Cloud-native applications for various industry sectors
without a great deal of refactoring a process that is costly, (engineering, financial, scientific, health);
time consuming and often producing disappointing results. Cloud-native operating systems and databases; and
However, over the last five years we have seen cloud
New models for capacity planning and pricing inspired by
software architectures evolve that promote the design of
cloud-native architecture paradigms.
applications that, from conception to deployment, are
envisioned, prototyped and built with cloud tools and
cloud resources. These cloud-native applications are born Special Issue Guest Editors
and run in the cloud and follow new classes of design and Roger Barga, Amazon AWS
maintenance patterns. Dennis Gannon, Indiana University
Neel Sundaresan, Microsoft Corporation
The purpose of the special issue is to urge the research
community to better define and document the cloud-native
movement. Topics of interest include but are not limited to: Submission Information
Submissions should be 3,000 to 5,000 words long, with a
Frameworks to make it easier for industry to build cloud- maximum of 15 references, and should follow the magazines
native applications; guidelines on style and presentation (see https://www
Educational approaches and community based .computer.org/web/peer-review/magazines for full author
organizations that can promote cloud-native design guidelines). All submissions will be subject to single-blind,
concepts; anonymous review in accordance with normal practice for
The tooling to develop cloud-native applications; scientific publications. For more information, contact the guest
The role of open source for building cloud-native editors at cc5-2017@computer.org.
applications; Authors should not assume that the audience will have
VM and container orchestration systems for managing specialized experience in a particular subfield. All accepted
cloud-native designs; articles will be edited according to the IEEE Computer Society
Cloud-native applications running in hybrid cloud or style guide (www.computer.org/web/publications/styleguide).
migrated from one cloud to another; Submit your papers through Manuscript Central at https://
Efficient mechanisms to make legacy applications mc.manuscriptcentral.com/ccm-cs.Guest Editors
cloud-native;
www.computer.org/cloudcomputing
Guest Editors Introduction
Cloud Security
Peter Mueller, IBM Zurich Research Laboratory
Chin-Tser Huang, University of South Carolina
Shui Yu, Deakin University
Zahir Tari, RMIT University
Ying-Dar Lin, National Chiao Tung University
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 23
GUEST EDITORS INTRODUCTION
both private and public clouds operated by multiple CHIN-TSER HUANG is an associate professor of
providers, with customized security requirements as computer science and engineering at the University of
well as self-management for reducing administration South Carolina, where hes the director of the Secure
complexity. The authors present the Supercloud se- Protocol Implementation and Development Labora-
curity architecture along with several use cases to tory. His research interests include network security,
illustrate its practical applicability. network protocol design and verification, and distrib-
uted systems. Huang has a PhD in computer science
from the University of Texas at Austin. Hes a senior
e hope you enjoy reading these five articles member of IEEE and ACM, and a member of Sigma
and expect that the publication of this spe- Xi and Upsilon Pi Epsilon. Contact him at huangct
cial issue will both increase public awareness of the @cse.sc.edu.
significance of cloud security and inspire further in-
vestigation on the development and enhancement of SHUI YU is a senior lecturer in the School of Infor-
state-of-the-art cloud security solutions. mation Technology at Deakin University, Australia.
His research interests include networking theory, net-
work security, privacy and forensics, and mathemati-
PETER MUELLER is a research staff member at IBM cal modeling. Hes a senior member of IEEE. Contact
Research. His research interests include datacenter him at shui.yu@deakin.edu.au.
storage security and reliability and high-frequency
technology. Hes a senior member of IEEE and a ZAHIR TARI is a full professor of distributed systems
member of the Society for Industrial and Applied at RMIT University, Australia. His research interests
Mathematics, the Electrochemical Society, and the include system performance (for example, webserv-
Swiss Physical Society. Contact him at pmu@zurich ers, peer to peer, and cloud computing) and system
.ibm.com. security (such as SCADA systems and the cloud). Tari
has a PhD in computer science from the University
of Grenoble, France. Contact him at zahir.tari@rmit
.edu.au.
Intelligence in
the Cloud
Submission deadline: 1 May 2017 Publication date: November/December 2017
A
rtificial intelligence (AI), since its birth in 1950s, has and applications for intelligence in the cloud with special
been heralded as the key to our civilizations brightest focus on, but not limited to, the following topics:
future. To pursue the vision of AI, various machine
learning approaches (for example, deep learning, supervised new distributed architecture for machine learning;
learning, unsupervised learning, reinforcement learning, and new machine learning engines in the cloud;
so on) have been proposed and a few have actually been analytics architectures, frameworks, and models for
developed and deployed in the market. The recent hype complex intelligent systems;
around big data has enthusiastically renewed the call and intelligent cloud applications or services such as intelligent
focus for advanced machine learning technologies to extract traffic, intelligent buildings, intelligent environments,
knowledge from large data pools. With its rich resource intelligent businesses, and so on;
provisioning, cloud computing is widely regarded as an ideal cloud resource allocation and optimization through
platform to facilitate resource-intensive machine learning so as machine-learning algorithms;
to enable intelligence in the cloud. Integrating intelligence into
machine learning for cloud resource management;
the cloud is without doubt a promising development trend to
both cloud computing and AI. combining human and machine intelligence in the cloud; and
security and privacy issues for intelligent systems in the cloud.
We are still at the early stage of integrating intelligence into
the cloud. Toward this exciting future, the path still entangles
many critical challenges in different aspects.
Special Issue Guest Editors
Song Guo, The Hong Kong Polytechnic University,
At the application layer, cloud-based efficient and powerful AI Hong Kong
techniques are highly in demand that target various applications Victor Leung, University of British Columbia, Canada
such as natural language processing, stock analysis, medical
Xin Yao, University of Birmingham, UK
diagnosis, intelligent industry control, intelligent transportation,
and scientific discovery.
Submission Information
At the platform layer, while intelligence has been deployed
Submissions should be 3,000 to 5,000 words long, with a
(for example, Sparks scalable machine learning MLlib and
Googles cloud machine-learning framework TensorFlow) maximum of 15 references, and should follow the magazines
new machine learning engines are expected for emerging guidelines on style and presentation (see https://www
computing frameworks (for example, the dataflow computing .computer.org/web/peer-review/magazines for full author
model HAMR). guidelines). All submissions will be subject to single-blind,
anonymous review in accordance with normal practice for
At the infrastructure layer, new cloud computing architecture scientific publications. For more information, contact the
and resource scheduling strategies are required to support guest editors at ccm6-2017@computer.org.
computation-intensive and IO-intensive machine learning Authors should not assume that the audience will have
algorithms. How to configure cloud computation, storage, and specialized experience in a particular subfield. All accepted
networking resources for fast, efficient, and scalable machine articles will be edited according to the IEEE Computer Society
learning must still be addressed.
style guide (www.computer.org/web/publications/styleguide).
The goal of this special is to seek original articles examining Submit your papers through Manuscript Central at https://
the state of the art, open research challenges, new solutions, mc.manuscriptcentral.com/ccm-cs.
www.computer.org/cloudcomputing
Cloud Security
Online Analysis
of Security Risks
in Elastic Cloud
Applications
Athanasios Naskos and Anastasios Gounaris, Aristotle University
of Thessaloniki
Haralambos Mouratidis, University of Brighton
Panagiotis Katsaros, Aristotle University of Thessaloniki
60
Security Concerns and Horizontal Scaling in
Public Clouds 50
Public and hybrid clouds, unlike private clouds, of-
fer resources to arbitrary customers, or tenants. 40
Tenants control neither the clouds security policy 30
nor the types of other tenants whose VMs are col-
located on the same physical machines. Although 8 9 10 11 12 13 14 15 16 17 18
this doesnt necessarily imply that public clouds are No. of VMs
insecure, many organizations cite it as a reason for
hesitating to migrate applications to the cloud.1 Figure 1. Execution plans based on different cost
A 2013 Cloud Security Alliance report identified metrics. Increasing the number of virtual machines
data breaches due to malicious cotenants as the top (VMs) leads, on average, to lower response times,
cloud-related security threat.2 These data breaches subject to increased probability of a malicious tenant
can lead to both data leakagethe unauthorized being collocated.
disclosure of data from one user to anotherand
data lossa condition where data is destroyed and
becomes unavailable. In addition, in a multitenant hoo Cloud Serving Benchmark (YCSB).4 The figure
environment, a lack of authorization mechanisms refers to a fixed rate of user requests and shows how
for sharing physical resources increases the risk the average and standard deviation values of the re-
of threats such as service traffic hijacking, which sponse latency vary with the number of VMs used.
occurs when attackers hijack cloud accounts by Thus, a strict threshold on latency would force the
stealing security credentials and eavesdropping on system to acquire additional VMs, the exact quan-
activities and transactions, and side-channel at- tity of which needs to be computed at runtime ac-
tacks, which use information obtained from band- cording to the current workload and considering the
width monitoring or other similar techniques. systems volatility. However, increasing the number
Moreover, when multiple tenants share an under- of VMs, and assuming that each VM runs on a dif-
lying infrastructure, the risk of threats related to ferent physical machine in the generic case, also
misconfiguration and uncoordinated change control increases the probability of a malicious tenant be-
increases, allowing a malicious tenant to gain access ing collocated. This probability might vary accord-
to another tenants resources. ing to the cloud provider type,5 but the important
Keeping the number of VMs as low as possible issue is that it is not negligible. Choosing a single
is an indirect way to mitigate data leakage- and loss- trustworthy provider isnt sufficient either, given
related security concerns but might entail an unac- that providers might offer VMs from other provid-
ceptable compromise on performance. Performance ers as well in periods of very high demand.6 Overall,
is one of the top three most studied service-level as noted earlier, adding VMs poses a threat of data
agreement (SLA) parameters, since critical applica- leakage and data loss. The magnitude of this threat
tions require responses in a fixed, short time period.3 depends on the application. In the figure, the threat
Therefore, a reliable cloud application must both ad- of data loss is lower than in typical applications, be-
dress security concerns and honor SLAs. cause NoSQL databases are replicated at least two
In the example in Figure 1, an elastic NoSQL or three times, making it harder for a malicious user
database serves user requests according to the Ya- to destroy all copies.
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 27
Cloud Security
$ $
v1 v2
w1 m1 w2 m2
NO_OP
REMOVE
Deployment Normal
$ cost Data loss state Added cost
Figure 2. A conceptual model of an elastic application that considers both security and performance service-level agreement
(SLA) requirements. Each state represents a specific configuration of the application in terms of number of VMs, along with its
security- and performance-related properties, at a given point of time. The transition between states is due to elasticity actions.
Because of the inherent trade-offs among se- of both cloud security 9,10 and dynamic resource al-
curity and performance requirements, any solution location in clouds.11,12
to analyze and enforce security-aware horizontal
scaling for cloud applications must be risk based. Modeling Elastic Applications
It should account for the probability of data leak- We advocate a model-based approach. Figure 2
age and data loss; dynamic evolution of the external shows a conceptual model of an elastic application
environment and volatility of system behavior; and that considers both security and performance SLA
potential heterogeneity of the cloud infrastructure. requirements and is deployed on a public cloud.
We advocate the use of a formal verification ap- Each curved rectangle represents a conceptual state
proach as a means to apply mathematical reasoning at a specific time instant. We model the elastic ap-
for providing security-oriented probabilistic guar- plications evolution as a transition to a state at a
antees for elastic cloud applicationsspecifically, future point in time t + t through elastic actions,
probabilistic model checking7 on top of system mod- such as adding or removing VMs.
els in the form of Markov decision processes (MDPs), For each state, we capture the features of inter-
which are instantiated on the fly. Our technique can est for the analysis and decision making:
analyze and provide evidence for key security-related
aspects of the running applicationsfor example, to the mixture of VM types employed, which we
answer questions such as, What is the probability assume are of two different types and/or provid-
that there will be a data leakage in the next hour? ers, v and w;
Moreover, it can drive elasticity decisions, taking the total deployment cost, m;
into account security constraints (for example, given the probability of data leakage, x;
the current query load and the prediction for this the probability of data loss, y;
load in the next half hour) to decide how many VMs the probability of performance-related SLA con-
to add or remove. dition violations, z; and
Using probabilistic model checking to analyze the probability of no security threats or perfor-
and drive elasticity decisions has shown promising mance violations, k.
initial results.4 Its potential in addressing security
requirements has also been demonstrated.8 Most We consider all these probabilities statistically
of the work in cloud security focuses on identifying independent, so k = (1 x)(1 y)(1 z). Further, we
risks, vulnerabilities, security mechanisms, digital can safely regard the probability of data leakage on a
signatures, access control, and manners to attain single machine of type A, dlA, as independent of the
security assurance, such as monitoring, certificates, number of VMs of type A employed. So, x = 1 (1
and auditability.9 However, the detailed investiga- dlv)v(1 dlw)w. Similarly, we can define y as a func-
tion of security assurance during horizontal scaling, tion of the number and type of VMs employed.
and even more, the security-aware elasticity decision The systems evolution due to elasticity actions
making that we hereby enable is novel in the fields refers to discrete time intervals of period t. We con-
State 1 p1
No data leakage
$
v1
p2
Performance violation
w1 m1
p3 No performance violation
p5 No data loss
z1% k1%
t
VM type (t)
p6
p7
$ Deployment cost
Figure 3. Mapping of a conceptual state to a set of Markov decision process (MDP) states. Each MDP state
corresponds to a distinct behavior type.
sider three actions: add, remove, and no_change. uncertainty. Finally, the rewards are used to perform
In the generic case, the effects of actions at time quantitative analysis (or solution) of MDP models.
t might be delayed and not manifested at the next The conceptual model needs to be implemented
period but after multiple time points. For example, as an MDP according to the analysis requirements.
adding a new VM to serve a NoSQL database im- Because we need to explicitly consider different
plies that a new VM needs to be created, booted, types of application behavior during analysis and
and configured, and it needs to receive data, which elasticity decision making, we map each conceptual
might take more than t time. During this process, system state to multiple MDP model states, one for
the system should be in a transient state. each behavior type. The behavior type is defined
according to the application nonfunctional require-
MDP Implementation of the Conceptual Model ments. Let an application set three requirements:
We implement our technique using the MDP mod- to avoid data leakage, data loss, and latency above
eling approach. We chose MDP because it enables a user-specified threshold. Then, each combination
analysis and decision making and can capture non- of a binary variable that indicates the satisfaction of
determinism and uncertainty in a given system.13 each requirement defines a behavior type (see Fig-
Both properties are essential in an elastic cloud ap- ure 3). Furthermore, each MDP state is annotated
plication. Because of horizontal scaling, at each time as to whether or not it refers to a transient state (not
point, the number of VMs can increase, remain the shown in the figure).
same, or decrease. This gives rise to nondeterminism. The actions are the same as in the conceptual
Also, at any given point, there might or might not be model. The only difference is that, if the MDP state
a performance or security-related requirement viola- is transient, only no_change is allowed, because
tion. This necessitates the modeling of uncertainty. making further resizing decisions during instable
MDPs are specified using states, actions, prob- periods is prone to suboptimal decision making. The
abilities, and rewards. The states represent system next step is to define the transition probabilities.
snapshots at specific time points, which are charac- Figure 4 gives a complementary view of Figure 3,
terized by a set of system properties. The actions are where each path from s to s corresponds to a MDP
transitions between the states, which express some model state. The probabilities p0 to p7 in Figure
change to the state properties. The probabilities re- 3 are the product of the probabilities in Figure 4
fer to each triple (state s)-(action a)-(state s) and rep- along the corresponding path. For example, p7 = (1
resent the probabilities of transition from one state x)(1 y)(1 z). In general, for a given initial state
to another due to a specific action thus quantifying s and action a,
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 29
Cloud Security
Performance t
Data leakage Data loss VM type (t)
violation
.LOGS
No performance Deployment
Logs Prediction No data leakage violation No data loss $ cost
Performance
Initial state Data loss check Data leakage check Next state
violation check
Elastic y y z
action
S S
1-y 1-y 1-z
v v
:vs :ws $:ms v
:vs
v
:ws $:ms
Figure 4. Setting MDP model probabilities for state transitions based on past log entries referring to the same
mixture of VM types and prediction of future external load.
p ( s, a, s) = 1 ,
s
we assume that a security analysis and profiling
mechanism is in place, which is capable of deriving
and multiple actions can be plausible for each state. attack probabilities as a function of the cluster
configuration. Such a mechanism is orthogonal to
our approach and can be even more sophisticated.
MDP Instantiation For example, upon instantiation, it could account for
To serve online analysis (which we discuss later in whether any VM additions would involve the use of a
detail), the model is instantiated on the fly. To this new physical machine rather than deploying VMs on
end, the decision depth and the probabilities take one thats already in use. Finally, since the models
actual values. Decision depth refers to how many pe- are instantiated on the fly, the attack probabilities
riods the model can account for. If the depth is too can be dynamically refined.
small, the system becomes too short-sighted; if its
too big, the prediction uncertainty increases. Both Online Analysis and Decision Making
situations lead to suboptimal decisions. Clearly, the The analysis is based on verification of models in-
number of MDP states grows exponentially in the stantiated on demand. To this end, we couple the
number of t periods. If a period represents 5 min- probabilistic model with Probabilistic Computation
utes of real time, a model with depth set to 4 refers Tree Logic (PCTL), a probabilistic property speci-
to a scenario where elasticity is reassessed every 5 fication language thats fed to the parallel reduced-
minutes, and the model looks ahead for 20 minutes. instruction-set multiprocessing (Prism) model
If the system evolves less rapidly, the application checker.14 We also show how the analysis can direct-
manager could map each period to a longer time. In ly support decision making with regard to elasticity.
general, setting t appropriately heavily relies on the Prism can efficiently analyze complex models.
application environments volatility (for example, if We have been able to solve MDP models with
during night hours the workload remains roughly 9,958 states corresponding to four periods in 0.073
stable, t can be increased). seconds using a machine with a quad-core CPU and
In our approach, we derive the probabilities x, 8 Gbytes of RAM, while the program is reported to
y, and z through logs. We analyze past log entries have processed models up to 1011 states on a single
referring to the same mixture of VM types as the machine.14
state of interest to estimate the probabilities. For
performance-related metrics, such as latency, we Examples of Verified Analysis Using Prism
consider not only the number of VMs but also the Figure 5 gives a concise view of the analysis of two
external load of incoming requests. This implies the PCTL properties. For simplicity, we grouped the
need to add a load prediction component, as Figure 4 eight states of Figure 3 in two groups according to
shows. In general, for our approach to be applicable, the data leakage property. Further, we assume that
What is the maximum probability (among all possible adversaries) of experiencing Pmax = ? [data leakage U !data leakage]
a data loss incident until eventually moving to a state with no data loss?
What is the maximum probability (among all possible adversaries) of moving from a Pmax = ? [data loss U F !data loss & steps
state with data leakage to a state with no data leakage? = max_steps]
Starting from any reachable state, is it always possible (that is, is there at least one filter(exists, P >= 1 [F !data leakage & steps
adversary) to eventually reach a state with no data leakage? = max_steps])
Starting from a state with no data loss, do all adversaries eventually reach a state filter(forall, P >= 1 [F data loss & steps =
with no data loss? max_steps], !data loss)
What is the maximum probability of experiencing data loss in a state that multi(P max = ? [X data loss],
immediately follows the initial state, while the probability to end up at a state with no P >= 0.9 [F !data loss & steps = max_steps])
data loss is greater than or equal to 0.9?
What is the maximum probability of having total cost of deployment less than a multi(P max = ? [F total cost <= Budget
specified budget, while the probability of experiencing any security incident does not & steps = max_steps], P <= 0.05 [G data
exceed 0.05? leakage & data loss])
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 31
Cloud Security
No. of VMs
10,000 14 equal reward, we can use a second PCTL on other
aspects, such as the probability of security violation
8,000 12 as presented in Table 1, to choose the final strategy.
Moreover, it isnt necessary to perform all ac-
10 tions in that strategy. An elasticity decision-making
6,000
technique tailored to NoSQL databases thats pre-
8 sented elsewhere follows the steps mentioned ear-
lier.4 After deciding on the adversary at each time
0 500 1,000 1,500 2,000 2,500
point the decision mechanism is activated, our pro-
Time steps (30 seconds)
posal enacts only the first elasticity action, and then
Incoming load No. of VMs (security reevaluates the whole adversary from scratch. Such
No. of VMs (performance) performance)
an approach is in line with a wide range of adaptive
solutions, such as model predictive control (MPC),15
Figure 6. Example of security-aware elasticity decisions. The number of which computes a sequence of adaptations but only
VMs follows the incoming load. When security requirements are taken applies the first step. According to evaluation re-
into consideration, the elasticity actions tend to be more conservative. sults published elsewhere,4 the quality of elasticity
decisions outperforms other proposals for scaling
NoSQL databases in avoiding both violations of la-
Decision Making tency thresholds and overprovisioning of VMs.
MDPs are inherently suited for decision making as Figure 6 shows how a security-aware elasticity
well. To this end, each model state needs to be as- decision maker behaves in a setting similar to the
sociated with a reward value. State rewards are one described in earlier work.4 Although the load
computed using functions that quantify various as- varies (green plot), the decision maker constant-
pects of the system like performance and security ly reevaluates the number of VMs. The blue plot
concerns or more concrete assets like the number shows the number of VMs when considering only
of active VMs or the actual deployment cost. As an performance requirements, and the red plot shows
example, consider the following utility function the behavior when the state rewards are computed
that uses weights to balance three aspects: the nor- based on Equation 1. In the latter case, we limit
malized probability of data leakage ( p dleak ) , data the use of VMs to mitigate the threat of data leak-
loss ( p dloss ) , and the latency exceeding a threshold age and loss.
( p per f ) :
u ( vmw) = a p
dleak + b p per f , a + b + c = 1 .(1)
dloss + c p ur approach is of interest to both owners
of elastic applications and cloud service
Note that, in general, we can prioritize threats providers. The outcomes of our proposal can be
and objectives. We reflect this on the utility func- used either to analyze (elastic) behavior or to make
tion by assigning different values to the weights. In elasticity decisions. Additionally, the analysis results
addition, our approach is orthogonal to any user- can be used to fine tune the utility function, acting
defined utility function. as a feedback mechanism, so that decisions are good
Our decision-making proposal is based on the in practice.
computation of the cumulative reward of every ad- Analysis and decision making in elastic appli-
versary. The model solver examines the possible al- cations is, by its nature, an instance of autonomic
ternativesthat is, all combinations of state transi- computing problems. A key issue for autonomic
tionsand computes the optimal cumulative state solutions is to render them dependable and endow
reward along with the corresponding sequence of them with a solid formal basis. To this end, proba-
actions. For example, using this utility function, the bilistic model checking not only allows for the con-
optimal reward is the minimum one. In Prism, we tinuous verification of system properties but is also
can do this with the help of a different type of PCTL an effective tool for meeting both security- and
specification that asks for reward minimization rath- performance-oriented goals.4,8
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 33
Cloud Security
Privacy-Preserving
Access to Big Data in
the Cloud
Peng Li, Song Guo, and Toshiaki Miyazaki, University of Aizu
Miao Xie and Jiankun Hu, University of New South Wales at the Australian
Defense Force Academy
Weihua Zhuang, University of Waterloo
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 35
Cloud Security
Islam and his colleagues prove that most of the pro- Oblivious RAM
tocols proposed for privacy-preserving cloud storage As Oded Goldreich and Rafail Ostrovsky originally
will leak access patterns due to efficiency issues.2 proposed, ORAM allows a trusted processor to use
A deliberately designed attack that exploits access untrusted RAM.18,19 Most existing ORAM solutions
pattern leakage can disclose a significant amount use the basic memory structure suggested by Ostro-
of sensitive information, such as the identification vskys hierarchical scheme.19 The ORAM is arranged
of encrypted email queries. The private information in a series of progressively larger caches. Each cache
retrieval (PIR) technique addresses the access pri- consists of a hash table of buckets. When a block
vacy problem by letting users retrieve a block from is requested, the algorithm checks a bucket at each
a database of N items held by a server that learns level of the hierarchy. If it finds the block, it con-
nothing about this block.7 Unfortunately, Radu Sion tinues searching for a dummy block, thus hiding
and Bogdan Carbuna showed that existing PIR tech- the desired blocks location. Finally, the algorithm
niques will never be more efficient than a trivial reinserts the block into the top-level cache. When a
PIR technique of downloading the entire database.8 cache is close to overflowing, its obliviously shuffled
PIRs extremely poor performance makes it inappli- into the cache below.
cable in cloud storage with big data. Recent ORAM works include optimizations of
ORAM provides data access privacy by periodi- the classic hierarchical scheme, such as the use of
cally reshuffling data blocks stored in an untrust- cuckoo hashing and Bloom filters.20 Peter Williams
ed server so user access cant be tracked. Michael and Radu Sion proposed SR-ORAM, the first sin-
Goodrich and Michael Mitzenmacher proposed gle-round-trip polylogarithmic time ORAM, which
an ORAM algorithm with O(pN) client storage to requires only logarithmic client storage.21 Taking
achieve O(logN) amortized cost, that is, each oblivi- only a single round trip to perform a query, SR-
ous read or write leads to O(logN) data access op- ORAM has an online communication/computation
erations on average.9 Elaine Shi and her colleagues cost of O(log n log log n). Jacob Lorch and his col-
further reduced the client storage to O(1).10 Later, leagues proposed Shroud, a general storage system
Tarik Moataz and his colleagues proposed replac- that hides data access patterns from the servers.22
ing homomorphic eviction with a new and much Shroud uses many secure coprocessors acting in
cheaper permute-and-merge eviction, so the block parallel as client proxies in the datacenter. Circuit
size can be reduced to (log4 N) while maintaining ORAM, a new tree-based ORAM scheme, achieves
O(1) complexity.11 optimal circuit size both in theory and in practice
Jonathan Dautrich and his colleagues combined for realistic choices of block sizes.23
PIR techniques with the most bandwidth-efficient To define privacy, we denote a data access se-
existing ORAM to reduce bandwidth cost.12 Chang quence as A = (op1, u1, data1), (op2, u2, data2), . . . ,
Liu and his colleagues developed ObliVM, a pro- where opi is the read or write operation, ui is the data
gramming framework for secure computation, and address, and datai denotes the data contents. Given
demonstrated it on various applications (for exam- two data access sequences A and A, a cloud storage
ple, data mining, streaming algorithms, and graph is defined to be privacy-preserved if its access pat-
algorithms).13 Xiangyao Yu and his colleagues pro- terns cant be distinguished within polynomial time.
posed and evaluated PrORAM, a dynamic ORAM Next, we present the ORAM algorithm that we
prefetching technique.14 A heuristic compact apply to the cloud storage scenario.
ORAM design, called SCORAM, is optimized for We consider a client that wants to store and re-
secure computation protocols. SCORAM is almost trieve data in a cloud; the cloud is honest but curi-
10 times smaller in circuit size and faster than all ousthat is, it cant tamper with or modify the data,
other designs, so its feasible to perform secure com- but it can learn information about the data. We di-
putations on gigabyte-sized datasets.15 Christopher vide the data into blocks, each of which is identified
Fletcher and his colleagues proposed a new ORAM by a unique address. For example, a typical block
structure, the PosMap lookaside buffer (PLB) and size value is 64 or 256 Kbytes. Data stored on the
PosMap compression techniques, that empirically cloud is organized as a tree, where each node, or
reduces the performance overhead from recursive bucket, stores several data blocks. Figure 1 shows an
ORAM.16 Finally, researchers developed a novel fork example binary tree structure. Note that any arbi-
path ORAM scheme that supports redundant mem- trary tree structure is applicable in ORAM. Follow-
ory accesses by leveraging three optimization tech- ing previous work,10 we translate each read or write
niques: path merging, ORAM request scheduling, operation into two primitives, ReadAndRemove and
and merging-aware caching.17 Add, which are defined as follows:
Load Balancing of ORAM Deployment Definition 1: The problem of load balance for deploy-
To deploy an ORAM-based storage in a distributed ing ORAM-based storage in clouds (LBOC). Given
system, we need to partition the corresponding tree a tree-based ORAM structure and a set of storage
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 37
Cloud Security
servers, the LBOC problem seeks a data placement servers. In each iteration, we only need to deal with
that minimizes the maximum access load among all a small-scale linear programming problem. Figure 2
servers. shows the flowchart for the algorithm.
Since a bucket is the minimum access unit in In the beginning, we initialize both the residual
ORAM, we define binary variables xij to describe capacity Cjres and current access load yjcurr of the jth
bucket placement, given by server to zero. In each iteration of the following while
loop, we conduct bucket placement by solving a lin-
1, if the ith bucket is placed on the jth servver, ear programming problem with respect to N, current
x ij
0, otherwise. access load, and Cjres on each server. The linear pro-
gramming problem is the same with LBOC formula-
Since each bucket has to be placed at only one serv- tion except that xij is relaxed so it can be a real vari-
er, we have the following constraint: able between 0 and 1, as shown in (9) in Figure 2. In
addition, we consider current access load yjcurr in con-
m
x ij = 1, 1 i n. (1) straint (5), and constrain the capacity of each server
j=1
with Cjres in (8).
The total size of buckets deployed to a server cannot We then sort xij in descending order according
exceed its capacity; we represent this as to their values after solving this linear programming
problem. For the ith bucket in set N, we place it in
n
x ij C j , 1 j m. (2) the server with maximum value of xij based on the
i=1
expectation that a larger xij represents higher prob-
We define another variable, yj, to denote the total ability of the corresponding optimal data placement.
access rate to the jth server. It can be calculated by We finally update the values of Cjres and yjcurr; 1 j
m, to finish this iteration.
n
yj = ai x ij , 1 j m. (3)
i=1
Load Balancing for Dynamic Deployment
The maximum access rate of all servers are con- In practice, the data access rate might change with
strained by Y, giving us time. For example, some users intensively access their
data during the daytime, but rarely connect to the
y j Y , 1 j m. (4) cloud storage at night. Some companies retrieve their
business data for statistic computation at night to
By summarizing these constraints, we formulate minimize access conflict with normal business during
the LBOC problem as a mixed integer linear pro- the day. In addition, irregular accesses exist because
gramming (MILP), given by of unpredictable user activities. Were motivated to
develop an online load-balancing algorithm to deal
LBOC: min Y with dynamic data accesses.
(1), (2), (3), and (4), A straightforward approach is to divide the time
x ij {0,1}, 1 i n,1 j m. into discrete time slots and re-execute the algorithm
for static deployment in each time slot according to
Theorem 1: The LBOC problem is NP-hard. the current access rates. Although this approach
Proof: We can prove the NP-hardness of the LBOC can always guarantee load balance, it will incur fre-
problem by reducing the well-known 2-partition quent data movement that consumes a large portion
problem. We complete the proof by following a pro- of network bandwidth among storage servers. To
cess similar to that presented elsewhere.24 address this challenge, we propose an online algo-
To solve the LBOC problem, we propose a fast rithm that dynamically adjusts data placement for a
heuristic algorithm whose basic idea is to first solve tradeoff between load balance and data movement
the MILP problem formulated by relaxing all integer among servers. In each time slot, we need to make
variables, and then find a feasible integer solution by two decisions. First, we must decide whether data
rounding the results. However, were dealing with rebalancing is needed. We define a threshold, denot-
big data, and the corresponding formulation might ed by , and conduct rebalancing if the gap between
contain a large number of variables and constraints the highest and lowest load servers are greater than
because of too many buckets in the ORAM tree. It the threshold . Otherwise, we keep the current data
would be time-consuming to solve such a large-scale placement. Second, we must decide how to move
linear programming problem. The basic idea of our data if load rebalancing is needed. Although we can
proposed algorithm is to iteratively place buckets on use the algorithm for static deployment, it might in-
Cjres = Cj , 1 j m
yjcurr = 0, 1 j m
WHILE
FOR
There are buckets that Put a set of unplaced buckets in set N
each xij in the order IF
havent been placed
the ith bucket isnt
Solve the following linear programming placed and Cjres > 0
min Y
yj + yjcurr Y, 1 j m; (5)
yj ai xij , 1 j m; (6)
iN
place this bucket on the jth server
m Cjres = Cjres 1;
j =1
x ij
= 1, i N; (7)
yjcurr = yjcurr + yj;
x
iN
ij
Cj , 1 j m;
res
(8)
0 xij 1, i N, 1 j m; (9)
IF
the ith bucket isnt
Sort xij in a descending order placed and Cjres > 0
END
3,500
OPT
cur a large amount of data movement because the ILB
3,000 RAND
optimization process ignores the current data place-
Maximum access load
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 39
Cloud Security
9,000 11,000
RAND RAND
Maximum access rate 8,000 ILB 10,000 ILB
6,000 8,000
5,000 7,000
4,000 6,000
3,000 5,000
600 700 800 900 1,000 10 20 30 40 50
Number of buckets Variance
(a) (b)
Figure 4. Maximum access rate versus (a) different numbers of buckets and (b) different variance of server
capacity. Our proposed algorithm, ILB, always outperforms the random algorithm.
7,000
1,900
1,850 6,000
Teaffic
1,800 5,000
1,750 4,000
1,700
3,000
1,650
1,600 2,000
1,550 1,000
100 200 300 400 100 200 300 400
The value of threshold The value of threshold
(a) (b)
Figure 6. Performance comparison: (a) maximum access load under different values of threshold , and
(b) traffic of data movement under different values of threshold .
algorithm, and solving practical deployment issues 6. Z. Wu et al., SPANstore: Cost-Effective Geo-
in the near future. replicated Storage Spanning Multiple Cloud
Services, Proc. ACM Symp. Operating Systems
Acknowledgments Principles, 2013, pp. 292308.
This work was supported by the Japan Society for 7. B. Chor et al., Private Information Retrieval, J.
the Promotion of Science KAKENHI grant number ACM, vol. 45, no. 6, 1998, pp. 965981.
16K16038 and the Council for Science, Technology 8. R. Sion and B. Carbunar, On the Computation-
and Innovation (CSTI), Cross-Ministerial Strategic al Practicality of Private Information Retrieval,
Innovation Promotion Program (SIP), Enhancement Proc. Network and Distributed Systems Security
of Societal Resiliency against Natural Disasters Symp., 2007, pp. 110.
(Funding agency: JST). A part of this article was pub- 9. M.T. Goodrich and M. Mitzenmacher, MapRe-
lished in the proceedings of the 2014 IEEE Confer- duce Parallel Cuckoo Hashing and Oblivious
ence on Computer Communications Workshops. RAM Simulations, CoRR, vol. abs/1007.1259,
2010.
References 10. E. Shi et al., Oblivious RAM with O((log N)3)
1. D. Beaver et al., Finding a Needle in Haystack: Worst-Case Cost, Advances in Cryptology (ASI-
Facebooks Photo Storage, Proc. 9th USENIX ACRYPT 11), 2011, pp. 197214.
Conf. Operating Systems Design and Implemen- 11. T. Moataz, T. Mayberry, and E.-O. Blass, Con-
tation (OSDI), 2010, pp. 4760. stant Communication ORAM with Small Block-
2. M. Islam, M. Kuzu, and M. Kantarcioglu, Ac- size, Proc. ACM SIGSAC Conf. Computer and
cess Pattern Disclosure on Searchable Encryp- Comm. Security, 2015, pp. 862873.
tion: Ramification, Attack, and Mitigation, 12. J. Dautrich and C. Ravishankar, Combin-
Proc. Network and Distributed System Security ing ORAM with PIR to Minimize Bandwidth
Symp. (NDSS), 2012, pp. 115. Costs, Proc. 5th ACM Conf. Data and Applica-
3. K.D. Bowers, A. Juels, and A. Oprea, Hail: A tion Security and Privacy (CODASPY), 2015, pp.
High-Availability and Integrity Layer for Cloud 289296.
Storage, Proc. 16th ACM Conf. Computer and 13. C. Liu et al., ObliVM: A Programming Frame-
Comm. Security, 2009, pp. 187198. work for Secure Computation, Proc. IEEE
4. A. Bessani et al., DepSky: Dependable and Se- Symp. Security and Privacy, 2015, pp. 359376.
cure Storage in a Cloud-of-Clouds, Proc. 6th 14. X. Yu et al., ProRAM: Dynamic Prefetcher for
Conf. Computer Systems, 2011, pp. 3146. Oblivious RAM, Proc. ACM/IEEE Ann. Intl
5. E. Stefanov et al., Iris: A Scalable Cloud File Symp. Computer Architecture (ISCA), 2015, pp.
System with Efficient Integrity Checks, Proc. 616628.
28th Ann. Computer Security Applications Conf., 15. X.S. Wang et al., SCORAM: Oblivious RAM
2012, pp. 229238. for Secure Computation, Proc. ACM SIGSAC
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 41
Cloud Security
Conf. Computer and Comm. Security, 2014, pp. big data,wireless network,and cyberphysical systems.
191202. Guo has a PhD in computer science from the Univer-
16. C.W. Fletcher et al., FreeCursive ORAM: [Near- sity of Ottawa. Hes a senior member of IEEE, a se-
ly] Free Recursion and Integrity Verification for nior member of ACM, and an IEEE Communications
Position-Based Oblivious RAM, Proc. 20th Intl Society Distinguished Lecturer. Contact him at song
Conf. Architectural Support for Programming Lan- .guo@polyu.edu.hk.
guages and Operating Systems, 2015, pp. 103116.
17. X. Zhang et al., Fork Path: Improving Efficiency Toshiaki Miyazaki is a professor in the School
of ORAM by Removing Redundant Memory Ac- of Computer Science and Engineering and the dean
cesses, Proc. 48th Intl Symp. Microarchitecture of the Undergraduate School of Computer Science
(MICRO), 2015, pp. 102114. and Engineering at the University of Aizu, Fukushima,
18. O. Goldreich, Towards a Theory of Software Japan. His research interests include reconfigurable
Protection and Simulation by Oblivious RAMs, hardware systems, adaptive networking technologies,
Proc. ACM Symp. Theory of Computing, 1987, and autonomous systems. Miyazaki has a PhD in elec-
pp. 182194. tronic engineering from the Tokyo Institute of Tech-
19. R. Ostrovsky, Efficient Computation on Oblivi- nology. Hes a senior member of IEEE, theInstitute of
ous RAMs, Proc. ACM 22nd Ann. ACM Symp. Electronics, Information, and Communication Engi-
Theory of Computing (STOC), 1990, pp. 514523. neers, and the Information Processing Society of Ja-
20. O. Goldreich and R. Ostrovsky, Software Pro- pan. Contact him at miyazaki@u-aizu.ac.jp.
tection and Simulation on Oblivious RAMs, J.
ACM, vol. 43, no. 3, 1996, pp. 431473. Miao Xie is a PhD student in the School of Engi-
21. P. Williams and R. Sion, Single Round Access neering and IT at the School of Engineering and IT,
Privacy on Outsourced Storage, Proc. ACM University of New South Wales at the Australian De-
Conf. Computer and Comm. Security (CCS), fence Force Academy. His research interests include
2012, pp. 293304. intrusion/anomaly detection in wireless sensor net-
22. J.R. Lorch et al., Shroud: Ensuring Private Ac- works, network security, data mining, and forecasting
cess to Large-Scale Data in the Data Centers, algorithms. Xie has a masters degree in engineering
Proc. USENIX Conf. File and Storage Technolo- and IT from University of New South Wales. Contact
gies (FAST), 2013, pp. 199213. him at m.xie@adfa.edu.au.
23. X. Wang, H. Chan, and E. Shi, Circuit ORAM:
On Tightness of the Goldreich-Ostrovsky Lower Jiankun Hu is a professor and research director at
Bound, Proc. ACM SIGSAC Conf. Computer the Cyber Security Lab, School of Engineering and
and Comm. Security, 2015, pp. 850861. IT, University of New South Wales at the Australian
24. P. Li and S. Guo, Load Balancing for Privacy- Defence Force Academy. His research interests in-
Preserving Access to Big Data in Cloud, Proc. clude cybersecurity, including biometrics security.
IEEE Conf. Computer Comm. Workshops, 2014, Hu has a PhD in control engineering from the Harbin
pp. 524528. Institute of Technology, China. Hes a member of the
IEEE. Contact him at j.hu@adfa.edu.au.
Peng Li is an associate professor in the School of Weihua Zhuang is a full professor in the Depart-
Computer Science and Engineering at the University ment of Electrical and Computer Engineering at the
of Aizu, Japan. His research interests include wireless University of Waterloo, Canada. Her research inter-
communication and networking, specifically wireless ests include multimedia wireless communications,
sensor networks, green and energy-efficient mobile wireless networks, and radio positioning. Zhuang has
networks, cross-layer optimization for wireless net- a PhD in electrical engineering from the University
works, cloud computing, big data processing, and of New Brunswick, Canada. Contact her at wzhuang
smart grid. Li has a PhD in computer science and @bbcr.uwaterloo.ca.
engineering from the University of Aizu, Japan. Hes
a member of IEEE. Contact him at pengli@u-aizu
.ac.jp.
Song Guo is a full professor in the Department Read your subscriptions through
the myCS publications portal at
of Computing atthe Hong Kong Polytechnic Univer- http://mycs.computer.org.
sity. His research interests include cloud computing,
MEMBERSHIP
OPTIONS TRAINING & DEVELOPMENT
FOR A
BETTER FIT. RESEARCH
BASIC
STUDENT
Cryptographic Public
Verification of Data
Integrity for Cloud
Storage Systems
Yuan Zhang, Chunxiang Xu, and Hongwei Li , University of Electronic Science
and Technology of China
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 45
Cloud Security
User Auditor
These influencing factors arent isolated and
can be closely related; thus, public verification tech-
Figure 1. System model. There are three entities in the public verification niques should be evaluated from both cryptography
scheme: user, cloud server (cloud service provider), and auditor. and engineering perspectives.
Warm-up Scheme
message, the cloud server generates corresponding We first review the public verification scheme pro-
proof information and sends it to the auditor. After posed by Shacham and Waters (SWP),11 which in-
receiving this information, the auditor verifies the data volves a user U, cloud server C, and auditor A. SWP
integrity by checking the proof informations validity. If consists of five algorithms.
the verification fails, the auditor informs the user that
the data might be corrupted. Because we use HLA, Setup. With a security parameter , U determines the
the proof information generated by the cloud server bilinear map: e: G GGT.11 Then U chooses secret
doesnt include the data file, thus the communication parameters and generates the public parameters
overhead between the cloud server and auditor is low. (v = g , u1, , us), where g is the generator of the
multiplicative group G.
Basic Public Verification Scheme
Because public verification aims to enable a third- Store. User U transforms its data M into n blocks and
party auditor to efficiently and securely verify the further splits each block into s sectors. User U chooses
integrity of outsourced data, these schemes should a random element name for file naming and computes
be evaluated using both systems and crypto criteria. a file tag on name and the public parameters,
Systems criteria include which enables A to check the validity of the public
parameters used to check the data integrity. In other
Efficiency. A public verification scheme should be words, the validity of ensures that C cant deceive A
as efficient as possible in terms of communication by replacing the public parameters. Then, U gener-
and computation overhead. ates a tag i for each data block. The tag i is based
Boundless verification. A public verification on the BLS signature,12 which is the HLA, and allows
scheme should enable auditors to verify data multiple tags to be aggregated into a single tag, where
integrity without a priori bound on the number the size of the aggregated tag is independent of the
of verification interactions. number of tags to be aggregated, and a verifier can
Stateless auditor. Auditors should be stateless confirm the validity of the aggregated tag instead of
and shouldnt need to maintain and update state checking the tags one by one. Finally, U outsources
during verification. the data file, file tag, and i to C.
Crypto criteria include Audit. For each verification task, A first determines I
and randomly chooses vi, i I, where I is a random
Soundness. Any time a cloud server passes the subset of set {1, , n} to determine which data
auditors verification, it must possess the specified blocks should be verified, and vi is a random element
data intact. This should be provably secure under for each verification. Next, A sends the challenge
the security model proposed by Hovav Shacham message chal = {(i, vi)}(i I) to C.
and Brent Waters.11
Resistance against external adversaries. A Prove. After receiving chal, C verifies , and then sends
secure public verification scheme should resist
common attacks, where an external, active, vi
i ,j = v m ( j [1, s])
iI
i ij
iI
and online adversary modifies the outsourced
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 47
Cloud Security
Table 1. The log file L used to verify data integrity. thus have the same framework and threat model.
Consequently, these schemes cant protect against ex-
Bitcoin hash Authenticator Random element Data ternal adversaries and malicious auditors.
(1)
Bl t (1) Rj
(1)
j
(1)
An external adversary can invalidate SWP, since
(2) (2) (2)
theres a definite linear relationship between the
Bl t (2) Rj j proof information (j) and the data blocks (mij). To
resist external adversaries without secure channels,
we adopt a random masking technique when com-
(k) (k) (k) (k)
Bl t Rj j puting the proof information. Specifically, we use
random masking as a nonlinear disturbance code to
change the definite linear relationship between the
proof information and the data blocks to a nonlinear
j = iI
ij ,
vim relationship.
To resist malicious auditors, auditors behav-
but still corresponds to mij rather than iorthat is, whether the auditor performs the es-
ij ( i I, j [1, s ]) . To deceive A and pass the
m tablished verificationshould be checked. In the
verification, the adversary can eavesdrop the enhanced scheme, the auditor is required to gener-
challenge message chal, intercept the proof ate an entry for each verification task and store it
information, and compute in a log file. The user audits the auditors behavior
by checking the log files validity, guaranteeing that
j = j
( i, v i )chal
v ilij (1 j s). a malicious auditor cant fabricate a verification re-
sult to deceive the user and/or cloud server. Here, we
Finally, the adversary sends the modified proof in- want to further emphasize that the periodicity of the
formation to A, and the modified data file passes the users audits of the auditor should be much longer
verification. than the periodicity of the auditors verification of
the datas integrity.
Vulnerability against Malicious Auditors However, such a paradigm cant deter malicious
A corrupted auditor can deceive C and U in several auditors perfectly, since a malicious auditor can still
ways. deceive the user by generating a biased challenge
First and simplest, a malicious auditor can claim message, where the corrupted data blocks will nev-
that the outsourced data is (not) retained intact in er be checked. For security and efficiency reasons,
the cloud, no matter what the verification result is, its also impractical to require the user to generate
even the malicious auditor wont perform the verifi- a new challenge message for each verification task.
cation. Because C and U trust the auditor, they will To address this problem, we use Bitcoin to construct
accept its claim without doubt. the challenge message. Given a determinate time t,
Second, the malicious auditor can collude with if t is a past or current time, we can easily find a
C to deceive U. In this case, the outsourced data has Bitcoin block, which is generated in the nearest time
been corrupted, but the auditor generates a biased t; however, if t is a future time, the Bitcoin block,
challenge message to check the data blocks, which which is generated in t, is unpredictable. Here, we de-
arent corrupted. note the hash of a Bitcoin block, which is generated
Third, the malicious auditor can collude with U in a past time t as Blt. Since Bitcoin has this property,
to circumvent C. That is, the outsourced data is re- we can consider Bitcoin as a time-based pseudoran-
tained in good condition, but the auditor claims that domness source. We can compute this sources out-
its been corrupted. put when its input is a past/current time; otherwise,
SWP cant protect against malicious auditors, so the output is unpredictable.
it must bear a strong assumption: auditors are hon- As Figure 2 shows, the enhanced scheme consists
est and reliable. Resisting malicious auditors is thus of six algorithms: Setup, Store, Audit, Prove, Verify,
a worthwhile area for further study. and CheckLog. In our enhanced scheme, the first two
algorithms are the same as those in SWP.
Protecting against Malicious Auditors and In Audit, the auditor first acquires Blt based on
External Adversaries the current time t and initializes the pseudorandom
As discussed earlier, malicious auditors and external bit generator as = GetRand(Blt). Then the auditor
adversaries can invalidate the SWP technique. Most generates the challenging message {(i, vi)}i I on
existing public verification schemes follow SWP, and and .
{M, i , }
Figure 2. Execution steps of our scheme. Different algorithms are performed by different entities and the
details are shown in the figure.
In Prove, C randomly chooses r G as a secret blocks and generates a set of challenge messages
parameter, and computes I(B) = {{ i(1) , v (i1) }i[ I(1) ],...,{ i(b) , v (i b) }i[ I( b ) ] } , where b is
the size of the subset B. Then the user sends B to the
R j = u rj , = iI
ivi , *j = iI
v i m ij , and auditor and receives (B) , R(jB) , (jB) ( j [1, s]) , where
where h() is a BLS hash. Then, C sends {, , j, Finally, the user audits the auditor by checking
Rj}j [1, s] to A.
e ( (B) , g)
s s h( R(jk ) )
In Verify, upon receiving the proof information v(i k ) (jk )
= e H ( (B) , g) (R(jk) )
u j kB , v.
{, j, Rj}j [1,s], the auditor verifies kB iI( B) kB j=1 j=1
(1)
s s
( , g) = e H(i || name) vi Rj j uj h(R j ) , v.
If Equation 1 fails, the user can consider that the
iI j=1 j=1
cloud-stored data is corrupted, and either the TPA
If the verification holds, the auditor creates an en- and/or C are malicious.
try as (Blt, , j, Rj)j[1,s]. Finally, the auditor stores the
entry in a log file L, as shown in Table 1 (in L, j [1, s]), Remark
where k denotes the index of the verification that the Unlike some previous schemes,4,7,8,13 we dont con-
auditor performs. That is, {Blt(k ) , (k ) , R(jk ) , (jk ) } is the sider privacy protection of user data against the audi-
proof information of kth verification. tor. An auditor in our scheme could be compromised
In CheckLog, to check the validity of L, the user or could collude with the cloud server, which might
first picks a random subset B of indices of Bitcoin reveal the users data to the auditor. Exploiting data
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 49
Cloud Security
encryption before outsourcing is an easy and afford- Next, we analyze the proposed schemes perfor-
able way to achieve such privacy protection. mance in terms of communication and computation
overhead. We tested all the experiments using a
Security and Performance Evaluation Windows 7 system with an Intel Core 2 i5 CPU run-
We first analyze the security of the proposed scheme ning at 2.53 GHz with 2 Gbyte DDR 3 of RAM (1.74
by using the crypto criteria proposed earlier. Gbytes available). We implemented all algorithms
With the proposed scheme, if the cloud server using C language and our code uses the MIRACL li-
passes the auditors verification, it ensures that the brary version 5.6.1. We use an MNT elliptic curve,12
verified data is intact. This claim is provably se- and a security level of 80 bits. The difference for
cure under the Shacham and Waters security mod- choices on s is discussed elsewhere.11 For simplicity,
el,11 and the formal proof is presented elsewhere.8 we give the atomic operation analysis for the case s = 1
Therefore, our proposed scheme achieves the sound- in the following.
ness criterion. We first analyze communication overhead be-
Next, we show that the proposed scheme can tween the cloud server and auditor. In Audit and
resist an external adversary. The adversary first Prove, the auditor sends the challenge message to
intrudes into the cloud server and modifies m ij to the cloud server, and the cloud server responds to
ij = m ij + lij . In Prove, the adversary intercepts the
m the auditor with the proof information. The size of
proof information {, , j , R j } j[1,s] , where the challenge message is c |i + vi|, where |i| denotes
the size of i. In our scheme, i and vi are random
*
j = r 1( *j + h(R j )) and j = iI
ij .
vim numbers with 80 bits under || = 80 bits. The size of
the proof information is || + || + |j| + |Rj|, which
Since r and r1 are unknown to the adversary, and is approximately equal to 80 bytes under the 80-bit
security level. Therefore, the total communication
j = r 1 ( *j + h(R j )) cost between the auditor and cloud server is approxi-
mately equal to 90 bytes for each verification task.
= r 1 vim ij + h(R j ) Next, we analyze our schemes computation
iI
overhead. Because the performance analysis on the
auditor side of the proposed scheme is presented
= r 1 v i m ij + h(R j ) + r 1 v ilij ,
iI iI elsewhere,8 we only analyze the additional cost on
the user side to protect against the malicious audi-
the adversary cant compute tor. In CheckLog, the user performs Equation 1 to
audit the auditors behavior, which is the only addi-
r 1 ( iI )
v ilij . tional computational overhead for the user. Figure
3 shows the additional verification overhead on the
Thus, such an attack is computationally infeasible. user side in different challenge entries. As Figure
Finally, we show that the proposed scheme can 3 shows, the user in the proposed scheme can au-
protect against a malicious auditor. Because the pro- dit the auditor with high efficiency. In other word,
posed scheme meets the soundness criterion, the compared with SWP, the proposed scheme requires
cloud server and malicious auditor cant forge a proof higher verification costs on the user side, but this
that passes the data integrity verification. In addi- extra cost is exactly the guarantee to resist the mali-
tion, the user will audit the auditors behavior, and cious auditor, so is a worthwhile sacrifice.
the auditor must execute the established verification.
Furthermore, because the challenge message is deter-
mined by the time-based pseudorandom source (that e plan to focus our future research efforts in
is, the Bitcoin) and can be recovered by the user, the several areas.
auditor cant deceive the user by generating a biased Most existing public verification schemes are
challenge message. In other words, even if the audi- based on the public-key cryptosystem. In these
tor colludes with the cloud server, it cant deceive the schemes, even the auditor is equipped with a powerful
user by only checking the uncorrupted datas integ- device, so verification is a second-long (hundreds of
rity. Therefore, malicious auditors cant invalidate the milliseconds) computation. For these schemes, it
proposed scheme. A formal and detailed proof is pre- would be impractical for the auditor to verify the
sented elsewhere.10 data integrity using a low-power device. Reducing
All in all, the proposed scheme meets the crypto the computation of the operations in the public-
criteria proposed earlier. key cryptography to those in the symmetric-key
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 51
Cloud Security
Yuan Zhang is a PhD student in computer sci- software and theory from UESTC. Hes a member of
ence and engineering at the University of Electronic IEEE, the China Computer Federation, and the Chi-
Science Technology of China (UESTC), Chengdu. na Association for Cryptologic Research. Contact him
His research interests include cryptography, network at hongweili@uestc.edu.cn.
security, and cloud computing security. Zhang has a
BSc from UESTC. Hes a student member of IEEE. Xiaohui Liang is an assistant professor in the De-
Contact him at ZY_LoYe@126.com. partment of Computer Science at the University of
Massachusetts, Boston. His research interests include
Chunxiang Xu is a professor of computer science applied cryptography, and security and privacy issues
and technology at the University of Electronic Sci- for e-healthcare systems, cloud computing, mobile so-
ence Technology of China. Her research interests in- cial networks, and smart grids. Liang has a PhD in
clude information security, cloud computing security, electrical and computer engineering from the Uni-
and cryptography. Xu has a PhD from Xidian Univer- versity of Waterloo, Canada. Hes a member of IEEE.
sity. Shes a member of IEEE. Contact her at chxxu@ Contact him at Xiaohui.Liang@umb.edu.
uestc.edu.cn.
ADVERTISER INFORMATION
Northeast, Midwest, Europe, Middle East: Advertising Sales Representatives (Jobs Board)
Ann & David Schissler
Email: a.schissler@computer.org, d.schissler@computer.org
Phone: +1 508 394 4026 Heather Buonadies
Fax: +1 508 394 1707 Email: h.buonadies@computer.org
Phone: +1 973 304 4123
Fax: +1 973 585 7071
http://www.computer.org/jobs
The IEEE Computer Society is a partner in the AIP Career Network, a collection of online job sites for scientists, engineers, and com-
puting professionals. Other partners include Physics Today, the American Association of Physicists in Medicine (AAPM), American
Association of Physics Teachers (AAPT), American Physical Society (APS), AVS Science and Technology, and the Society of Physics
Students (SPS) and Sigma Pi Sigma.
Cloud Security
To Docker or Not
to Docker: A Security
Perspective
Theo Combe, Telecom Paris-Tech
Antony Martin and Roberto Di Pietro, Nokia Bell Labs
VM VM VM VM VM VM Docker daemon
Figure 1. Comparing various application runtime models: (a) a type 1 hypervisor, (b) a type 2 hypervisor, and (c) a container.
in this. Finally, Docker is already running in some example, Berkeley Software Distribution (BSD) jails
environments, making it possible to run experi- and chroot can be considered an early form of con-
ments and explore the practicality of some attacks. tainer technology. Recent Linux-based container so-
lutions rely on kernel supportthat is, a userspace
Containerization and Dockerization in a library to provide an interface to syscalls and front-
Growing Ecosystem end applications. There are two main kernel imple-
Cloud applications have typically leveraged virtu- mentations: Linux container (LXC) implementations
alization. However, several factorsincluding ac- using cgroups and namespaces, and the OpenVZ
celeration of the development cycle (such as agile patch. Table 1 shows the most popular implementa-
methods and DevOps), an increasingly complex ap- tions and their dependences.
plication stack (mostly Web services and their frame- Containers can be integrated in a multitenant
works), and market pressure to densify applications environment, thus profiting from resource sharing
on servershave triggered the need for a fast, easy- to increase average hardware use. This is achieved
to-use way of pushing code into production. by sharing the kernel with the host machine. In-
deed, unlike VMs, containers dont embed their own
Linux Containers kernel, but rather run directly on the host kernel.
Figure 1 shows how virtualization hypervisors (Fig- This shortens the syscalls execution path by remov-
ures 1a and 1b) compare to a container (Figure 1c), ing the guest kernel and the virtual hardware layer.
which provides near-bare-metal performance1 and Additionally, containers can share software resourc-
offers the possibility of seamlessly running mul- es (such as libraries) with the host, hence avoiding
tiple versions of applications on the same machine. code duplication. The absence of kernel and some
New instances of containers can be created quasi- system libraries make containers very lightweight
instantly to face a customer demand peak, which is (image sizes can shrink to a few megabytes), which
convenient for spawning applications on-demand or enables a quick boot process.
quickly moving a service, such as when implement-
ing network function virtualization (NFV). Docker
Containers have long existed in various forms As Figure 2 shows, the Docker ecosystem includes
that differ by the level of isolation they provide. For various components. Docker provides a specification
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 55
Cloud Security
Linux Docker libcontainer cgroups + namespaces + capabilities iptables, perl, Apparmor, sqlite, Go
containers + kernel version 3.10 or above
(LXC)
LXC liblxc cgroups + namespaces + capabilities Go
for container images and runtime, including Docker- process is close to the classical VM installation, but
files that allow a reproducible building process (Figure must be performed at each image rebuild (such as
2a). Docker software implements this specification for updates); because the base image is standard-
using the Docker daemon, known as the Docker en- ized, the sequence of commands is exactly the same.
gine. The repositories include a central repository, the To automate this process, Dockerfiles (Figure2a) let
Docker hub, which lets developers upload and share users specify a base image and a sequence of com-
their images, along with a trademark and bindings mands to be performed to build the image, along
with third-party applications (Figure 2b). Finally, the with other optionssuch as exposed portsspe-
build process fetches code from external repositories cific to the image. The image is then built with the
and holds the packages that will be embedded in the docker build command.
images (Figure 2c). Docker is written in the Go lan-
guage and was first released in March 2013. Docker internals. Docker containers create a
wrapped, controlled environment on the host ma-
Docker specification. The specifications scope is con- chine in which arbitrary code can be (ideally) run
tainer images and runtime. Docker disk images are safely. This isolation is achieved through two main
composed of a set of layers, along with metadata in the kernel featureskernel namespaces8 and control
JavaScript Object Notation (JSON) format. The im- groups (cgroups)that were merged starting from the
ages are stored at /var/lib/docker/<driver>/, Linux kernel version 2.6.24. Namespaces are used to
where <driver> stands for the storage driver being split the view that processes have of the system. Cur-
usedsuch as advanced multi-layered unification rently, the kernel has six different namespacesPID,
filesystem (aufs), B-tree file system (Btrfs), virtual IPC, NET, MNT, UTS, and USERthat isolate vari-
filesystem switch (VFS), device mapper, or OverlayFS. ous aspects of the system. Each of these namespaces
Each layer contains the filesystem modifications rela- has its own kernel internal objects related to its type,
tive to the previous layer, starting from a base im- and each gives processes a local instance of some
age (typically, a lightweight Linux distribution). This paths in the /proc and /sys filesystems. The Linux
lightweight Linux distribution organizes the images namespaces isolation role is detailed elsewhere.3 The
in trees; each image has a parent, except for the base cgroups are a kernel mechanism to restrict the re-
images, which are the roots of the trees. This struc- source usage of a process or group of processes. Their
ture allows Docker to ship in an image only the modi- goal is to prevent a process from taking all available
fications specifically related to it. resources and starving other processes and contain-
Docker can build images in two ways. It can ers on the host. Controlled resources include CPU
launch a container from an existing image (docker shares, RAM, network bandwidth, and disk I/O.
run), perform modifications and installations inside
the container, and then stop the container and save The Docker daemon. The Docker software runs as a
its state as a new image (docker commit). This daemon on the host machine. It can launch contain-
Isolation Tasks
Docker daemon Commands
Services
Docker containers rely exclusively on Linux kernel Host libraries
features, including namespaces, cgroups, hardening, Host OS
and capabilities. Namespace isolation and capabili- Hardware
ties drop are enabled by default, but cgroups limita- docker run / ps /
inspect / exec ...
tions arent; they must be enabled on a per-container (c)
basis through -a -c options on container launch.
The default isolation configuration is relatively Figure 2. The Docker ecosystem. (a) Docker specifies container images
strict. The only flaw is that all containers share the and runtime, including Dockerfiles that enable a reproducible building
same network bridge, enabling Address Resolution process. (b) The Docker repositories. (c) The build process. Arrows show
Protocol (ARP) poisoning attacks between contain- the code path and associated commands (docker <action>).
ers on the same host.
However, as we describe in more detail later,
Dockers global security can be lowered by options, This includes options lowering security, such as the
triggered at container launch, that give extended insecure-registry option, which disables the
access on some parts of the host to containers. Ad- Transport Layer Security (TLS) certificate check on
ditionally, security configuration can be set glob- a particular registry. Options that increase security
ally through options passed to the Docker daemon. such as the icc=false parameter, which forbids
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 57
Cloud Security
network communications between containers and nection to this socket can give root privileges on the
mitigates the ARP poisoning attackare available, host. Therefore, the connection must be secured with
but they prevent multicontainer applications from TLS (tlsverify), which enables both encryption
operating properly, and hence are rarely used. and authentication of the two sides of the connection
(and requires additional certificate management).
Host Hardening
Host hardening through Linux kernel security mod- Docker Usages: Security Challenges
ules enforces security-related limitation constraints Most of the security discussions about containers
imposed on containers (such as compromising a compare them to VMs, thus assuming both tech-
container and escaping to the host operating sys- nologies are equivalent in terms of design. Although
tem). Currently SELinux, Apparmor, and Seccomp this is the aim of some container technologies (such
are supported with available default profiles. These as OpenVZ, which is used to spawn virtual private
profiles are generic, not restrictive. The docker- servers), recent lightweight container solutions such
default Apparmor profile9 (https://wikitech as Docker were designed to achieve completely dif-
.wikimedia.org/wiki/Docker/apparmor), for exam- ferent objectives than those of VMs. Therefore, its
ple, allows full access to the filesystem, network, important to develop Dockers typical usages to dis-
and all capabilities of Docker containers. Similarly, cuss their security implications and how they affect
the default SELinux policy puts all Docker objects Dockers security.
in the same domain. Therefore, while default hard-
ening protects the host from containers, it doesnt Docker Usages
protect containers from other containers. This secu- We can distinguish three types of Docker usages.
rity aspect can be addressed by writing specific pro- Recommended usages are those that Docker was
files that depend individually on the containers. designed for, as explained in the official documenta-
tion. Docker developers recommend a microservices
Network Security approach13 that is, a container must host a single
Docker uses network resources for image distribu- service, in a single process or in a daemon spawning
tion and remote control of the Docker daemon. children. Therefore, a Docker container isnt consid-
To distribute images, Docker verifies images ered a VM: theres no package manager, no init pro-
downloaded from a remote repository with a hash cess, no sshd to manage it. All administration tasks
and the connection to the registry is made over TLS (container stop, restart, backups, updates, builds,
(unless explicitly specified otherwise). Moreover, and so on) must be performed via the host machine,
the Docker Content Trust architecture now lets de- which implies that the legitimate containers admin
velopers sign their images before pushing them to has root access to the host.
a repository.10 Content Trust relies on the update Docker developers also recommend a repro-
framework (TUF),11 which was specifically designed ducible and automated deployment of applications.
to address package manager flaws.12 TUF can recov- Docker images should be built anywhere through a
er from a key compromise, mitigate replay attacks by generic build file (Dockerfile) which specifies the
embedding expiration timestamps in signed images, steps to build the image from a base image. This ge-
and so on. The tradeoff is complex key management; neric way of building images makes the process and
TUF actually implements a public-key infrastruc- the resulting images almost host-agnostic, depending
ture in which each developer owns a root key (of- only on the kernel and not on the installed libraries.
fline key) that is used to sign signing keys that are Widespread usages include common usages of
used to sign Docker images. Docker by application developers and system ad-
The Docker daemon is remote-controlled through ministrators. Some system administrators or devel-
a socket, making it possible to perform any Docker opers use Docker as a way to ship complete virtual
command from another host. By default, the sock- environments and update them regularly, turning
et used to control the daemon is a Unix socket, their containers into VMs. Although this is conve-
located at /var/run/docker.sock and owned nient because it limits system administration tasks
by root:docker, but it can be changed to a TCP to the bare minimum (such as docker pull), as
socket. Access to this socket lets attackers pull and we describe later, it has several security implica-
run any container in privileged mode, thereby giv- tions. With containers embedding enough software
ing them root access to the host. In case of a Unix to run a full system (logging daemon, ssh server,
socket, a user member of the docker group can gain and even sometimes an init process), its tempting
root privileges; when a TCP socket is used, any con- to perform administration tasks from within the
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 59
Cloud Security
are working as expected in both the recommended more relevant in widespread usages, where contain-
and PaaS usagesthat is, there are no implementa- ers are used as VMs and thus have a bigger attack
tion vulnerabilities or CVEsa privilege boundary surface than microservice containers. They also
between the containers and the host machine has have more vulnerabilities, leading to attacks such as
been designed. Technical controls supporting the container escapes.
boundary include the isolation of processes through
namespaces, resources management through Weak local access control. Beyond the kernel
cgroups, and (by default) limited communication ca- namespaces, cgroups, Docker dropping capabilities,
pabilities between the containers and the host. and mount restrictions, mandatory access control
In contrast, widespread usages take advantage (MAC) enforces constraints if the normal execution
of optionsgiven either to the Docker daemon on flow isnt respected. This approach is visible in the
startup or to the command launching a container docker-default Apparmor policy. However, the
that give containers extended access to the host. MAC profiles for containers have room for improve-
When used with untrusted containers, these op- ment. In particular, Apparmor profiles typically
tions trigger many security concerns, including the behave as whitelists,14 explicitly identifying which
following: resources any process can access while denying any
other access when the profile is in enforce mode.
options giving extended access to the host to However, the docker-default profile installed
containers (net=host, uts=host, privi- with the docker.io package gives containers com-
leged, and so on); plete access on network devices and filesystems
the mounting of sensitive host directories in with a full set of capabilities, and contains a small
containers; list of deny directives, which constitute a de facto
TLS configuration of remote image registries; blacklist.
permissions on the Docker control socket; and These vulnerabilities are relevant to all usages
cgroups activation (disabled by default). and could lead to the attacks mentioned earlier,
such as DoS or container escapes.
For instance, when given the option net=host
at container launch, Docker doesnt place the con- Image distribution vulnerabilities. The distribution
tainer into a separate NET namespace; it therefore of images through the Docker hub and other regis-
gives the container full access to the hosts network tries in the Docker ecosystem is a source of vulner-
stack (enabling network sniffing, reconfiguration, abilities. Because these vulnerabilities are similar to
and so on). The option uts=host lets the contain- classical package managers,12 we consider only the
er in the same UTS namespace as the host, which automated deployment pipeline perspective here.
lets the container see and change the hosts name Automated builds and webhooks proposed by
and domain. The option cap-add=<CAP> gives the Docker hub are key elements in the image distri-
the container the specified capability, thus making bution process. They lead to a pipeline in which each
it potentially more harmful to the host. With cap- element has full access to the code that will end up
add=SYS_ADMIN, a container can, for example, re- in production, and are increasingly hosted in the
mount /proc and /sys subdirectories in read/write cloud. For instance, to automate this deployment,
mode and change the hosts kernel parameters, lead- Docker proposes automated builds on the Docker
ing to potential vulnerabilities, data leakage, or DoS. hub, triggered by an event from an external code re-
Along with these runtime container options, pository (such as github). Docker then proposes to
several settings on the host can influence potential send an HTTP request to a Docker host reachable
attacks. Even basic properties can at a minimum on the Internet to notify it that a new image is avail-
trigger DoS. For instance, when using some storage able. This triggers an image pull and a container re-
drivers (aufs), Docker doesnt limit containers disk start on the new image (through Docker hooks; see
usage. A container with a storage volume can fill up https://docs.docker.com/docker-hub/webhooks).
this volume and affect other containers on the same In this deployment pipeline, a commit on github
hostor even the host itselfif the Docker storage will trigger a build of a new image and automatically
located at /var/lib/docker isnt mounted on a launch it into production. Optional test steps can be
separate partition. added before production, which might themselves
As mentioned earlier, whatever the usages are, be hosted at yet another provider. In this case, the
containers are an attack vector and therefore rep- Docker hub makes a first call to a test machine that
resent a potential threat for the host. This is even will then pull the image, run the tests, and send re-
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 61
Cloud Security
10. Docker, Content Trust in Docker, Docker User security, telecommunications, and programming. Con-
Guide, 2016; https://docs.docker.com/engine/ tact him at theo-nokia@sutell.fr.
security/trust/content_trust.
11. J. Samuel et al., Survivable Key Compromise Antony Martin is a security analyst and mem-
in Software Update Systems, Proc. 17th ACM ber of the technical staff in the security department at
Conf. Computer and Comm. Security (CCS 10), Nokia Bell Labs, Nozay, France. His research interests
2010, pp.6172. include network security, virtualization, cloud com-
12. J.Cappos et al., A Look in the Mirror: Attacks puting, and network function virtualization. Martin
on Package Managers, Proc. 15th ACM Conf. has an engineering degree in telecommunications
Computer and Comm. Security, P.Ning, P.F. Sy- from Telecom Lille engineering school and holds a
verson, and S.Jha, eds., 2008, pp.565574. number of technical certifications. Contact him at
13. Docker, Best Practices for Writing Dockerfiles, antonymartin.pro@gmail.com.
Docker User Guide, 2016; https://docs.docker
.com/engine/userguide/eng-image/dockerfile Roberto Di Pietro is security research global
_best-practices. head at Nokia Bell Labs, Paris-Saclay, France, and a
14. Novell, Novell AppArmor Administration Guide, part-time professor of computer science (security) at
Oct. 2007; w w w.suse.com/documentation/ the University of Padova, Italy. His research interests
apparmor/pdfdoc/ book _apparmor21_admin/ include security, privacy, distributed systems, com-
book_apparmor21_admin.pdf. puter forensics, and analytics. Di Pietro has a PhD
15. T.Combe, A.Martin, and R.Di Pietro, Contain- in computer science from the University of Roma La
ers: Vulnerability Analysis, tech. report, Nokia Sapienza. Contact him at roberto.di_ pietro@nokia
Bell Labs; http://ricerca.mat.uniroma3.it/users/ -bell-labs.com.
dipietro/containers_security.pdf.
EXECUTIVE STAFF
Executive Director: Angela R. Burgess; Director, Governance & Associate Executive
PURPOSE: The IEEE Computer Society is the worlds largest association of computing Director: Anne Marie Kelly; Director, Finance & Accounting: Sunny Hwang;
professionals and is the leading provider of technical information in the field. Director, Information Technology & Services: Sumit Kacker; Director, Membership
MEMBERSHIP: Members receive the monthly magazine Computer, discounts, and Development: Eric Berkowitz; Director, Products & Services: Evan M. Butterfield;
opportunities to serve (all activities are led by volunteer members). Membership is open to Director, Sales & Marketing: Chris Jensen
all IEEE members, affiliate society members, and others interested in the computer field.
OMBUDSMAN: Email ombudsman@computer.org. COMPUTER SOCIETY OFFICES
COMPUTER SOCIETY WEBSITE: www.computer.org Washington, D.C.: 2001 L St., Ste. 700, Washington, D.C. 20036-4928
Phone: +1 202 371 0101 Fax: +1 202 728 9614 Email: hq.ofc@computer.org
Next Board Meeting: 1314 November 2016, New Brunswick, NJ, USA Los Alamitos: 10662 Los Vaqueros Circle, Los Alamitos, CA 90720
EXECUTIVE COMMITTEE Phone: +1 714 821 8380 Email: help@computer.org
President: Roger U. Fujii
MEMBERSHIP & PUBLICATION ORDERS
President-Elect: Jean-Luc Gaudiot; Past President: Thomas M. Conte; Phone: +1 800 272 6657 Fax: +1 714 821 4641 Email: help@computer.org
Secretary: Gregory T. Byrd; Treasurer: Forrest Shull; VP, Professional and Educational Asia/Pacific: Watanabe Building, 1-4-2 Minami-Aoyama, Minato-ku, Tokyo 107-0062,
Activities: Andy T. Chen; VP, Member & Geographic Activities: Nita K. Patel; Japan Phone: +81 3 3408 3118 Fax: +81 3 3408 3553 Email: tokyo.ofc@computer.org
VP, Publications: David S. Ebert; VP, Standards Activities: Mark Paulk;
VP, Technical & Conference Activities: Hausi A. Mller; 2016 IEEE Director & Delegate IEEE BOARD OF DIRECTORS
Division VIII: John W. Walz; 2016 IEEE Director & Delegate Division V: Harold Javid; President & CEO: Barry L. Shoop; President-Elect: Karen Bartleson; Past President:
2017 IEEE Director-Elect & Delegate Division VIII: Dejan S. Milojii Howard E. Michel; Secretary: Parviz Famouri; Treasurer: Jerry L. Hudgins; Director
& President, IEEE-USA: Peter Alan Eckstein; Director & President, Standards
BOARD OF GOVERNORS Association: Bruce P. Kraemer; Director & VP, Educational Activities: S.K. Ramesh;
Term Expriring 2016: David A. Bader, Pierre Bourque, Dennis J. Frailey, Jill I. Gostin, Director & VP, Membership and Geographic Activities: Wai-Choong (Lawrence) Wong;
Atsuhiro Goto, Rob Reilly, Christina M. Schober Director & VP, Publication Services and Products: Sheila Hemami; Director & VP,
Term Expiring 2017: David Lomet, Ming C. Lin, Gregory T. Byrd, Alfredo Benso, Technical Activities: Jose M.F. Moura; Director & Delegate Division V: Harold Javid;
Forrest Shull, Fabrizio Lombardi, Hausi A. Mller Director & Delegate Division VIII: John W. Walz
Term Expiring 2018: Ann DeMarle, Fred Douglis, Vladimir Getov, Bruce M. McMillin,
Cecilia Metra, Kunio Uchiyama, Stefano Zanero revised 10 June 2016
A
lthough cloud technologies have been advanced and adopted at an astonishing
pace, much work remains. IEEE Cloud Computing seeks to foster the evolution of
cloud computing and provide a forum for reporting original research, exchanging
experiences, and developing best practices.
IEEE Cloud Computing magazine seeks accessible, useful papers on the latest peer-reviewed
developments in cloud computing. Topics include, but arent limited to:
All accepted articles will be edited according to the IEEE Computer Society style guide.
Submit your papers through Manuscript Central at https://mc.manuscriptcentral.com/ccm-cs.
If you have any questions, feel free to email lead editor Brian Brannon at bbrannon@computer.org.
www.computer.org/cloudcomputing
Cloud Security
User-Centric Security
and Dependability
in the Clouds-of-
Clouds
Marc Lacoste, Orange Labs
Markus Miettinen, Technische Universitt Darmstadt
Nuno Neves and Fernando M.V. Ramos, University of Lisbon
Marko Vukoli, IBM Research
Fabien Charmet and Reda Yaich, Institut Mines-Telecom
Krzysztof Oborzyski and Gitesh Vernekar, Philips Healthcare
Paulo Sousa, Maxdata Software
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 65
Cloud Security
ent and user-configurable manner. Individual U- preventing cloud providers from accessing user data
Clouds must be strictly separated, preventing, for without the users explicit consent.
instance, misbehaving U-Clouds from impacting Finally, the architecture should guarantee integ-
other U-Clouds. rity and availability of services and data. It should
The architecture must also support interoper- allow specification and enforcement of measures
ability at the infrastructure and platform levels. It related to integrity, redundancy, and disaster recov-
should support a distributed cloud with flexibility and ery of data resources as part of a user-provider SLA.
control levels similar to those in a single-provider sce- Performance guarantees might also be required,
nariofor example, in terms of usage or migration of namely on response times for critical accesses to
resources across providers. In particular, it should en- some data resources.
able the deployment of legacy applications and man-
agement tools in the distributed cloud infrastructure. System Architecture
Third, it should enable user-controlled security. We now describe the architecture of the Super-
It should allow users to define fine-grained security cloud, both statically (that is, its components) and
settings to control the protection level of their cloud dynamically (that is, how these components interact
resources. For instance, to meet legal requirements to guarantee overall security).
that prohibit transfer of particular data types across
jurisdictional boundaries, users might need to con- Static Architecture
trol where their U-Cloud data is physically stored The Supercloud architecture allows customers to
and processed. It must also protect user privacy by instantiate U-Clouds that run on the underlying
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 67
Cloud Security
Horizontal orchestration
Overall User User User User User
security VM VM VM VM VM
management
management
Compute
Computing
Computing
self- plane
security
security
management
L1 L1 L1
Data plane
Network
plane
Private cloud
USS USS
Provider 1 Provider 2
(a)
Data security
management Data plane
Network
plane
Provider 1 Provider 2
(b)
Overall security self-management
Data plane
Provider 1 Provider 2
Secure tunnel
(c)
Figure 3. Detailed view of the Supercloud architecture: (a) compute plane, (b) data plane, and (c) network
plane. Each figure shows detailed subcomponents for computation, data management, and networking, and
interplay with security self-management. (OvS: Open vSwitch; SDN: software-defined networking controller)
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 69
Cloud Security
changes to clients without installing additional li- dress translator (to offer L2 and L3 address virtual-
braries. In contrast, direct accessor clients run Su- ization), a topology abstraction module (for topology
percloud-specific logic as a client library and can virtualization), and a resource isolation application
interact and access storage servers and L1 cloud (to slice network resources among tenants, such as
provider services directly. Direct accessor clients switch CPU and forwarding tables). The network
can also have certain features of storage servers hypervisor controls and configures the OvS switches
built-in. Such clients could thus also be indepen- that are installed in all VMs. An SDN controller will
dent of storage servers. establish secure connections with each OvS switch
Proxies, typically L2 VMs, facilitate client ac- to control the forwarding plane.
cess to Supercloud storage and data management The network hypervisor is built as an applica-
offerings, such as for encryption and secure dedu- tion that runs in the Supercloud SDN controller.
plication. Theyre usually stateless and can be easily Each cloud will host a specific VM, the network
added dynamically to the system. proxy, where secure tunnels are set up to all other
Servers, typically stateful L1 or L2 VMs, per- clouds. In a distributed configuration, each proxy
form housekeeping of critical portions of metadata will host an instance of the SDN controller.
vital to the Supercloud data planes operation, such Security management is facilitated through the
as metadata for storage, data integrity, or configu- interplay of overall security self-management and
ration management. Cloud provider services (CPSs) network security management components, which
are L1 cloud storage services that direct accessor enable Supercloud users to specify user-specific
clients or proxies can directly access. They expose settings for network configurations inside their
different APIs, notably object storage and block stor- U-Clouds.
age. Examples include OpenStack Swift and Ama-
zons Simple Storage Service (S3) and Elastic Block Dynamic Architecture
Store. Cloud provider data nodes are L1 VMs in the Figure 4 illustrates two typical workflows between
distributed provider infrastructure. Complementing some key Supercloud architecture components. User
CPS, they can perform computation and have locally 1 interacts with its VM (u1VMx) through a set of
mounted L1 block storage for Supercloud user data. APIs. Providers 1 and 2 host compute (VMx), net-
Security self-management components allow working (NVMx), and storage management (DVMx)
arbitration between provider and user data security VMs. Provider 1 also hosts a physical storage ser-
settings. vice. Supercloud considers a nested architecture
that is, u1VMx runs inside VMx.
Network plane. Figure 3c illustrates the Supercloud Supercloud users interact through four inter-
network virtualization architecture. Its main de- faces to deploy their applications in the cloud. The
sign goals are network controllability; full network network plane interface, typically the network hy-
virtualization to guarantee isolation between users, pervisor, interacts with the SDN controller and
while enabling them to use their desired addressing network proxies, hosted in the NVMx machines, to
schemes and topologies; and VM snapshotting and handle communication and establish secure tunnels
migration for availability and flexibility. with other clouds. The data plane interface, typi-
To fulfill these objectives, the architecture lever- cally storage proxies, interacts with the providers
ages software-defined networking (SDN),11 which DVMx VMs to ensure access to the users private
provides logically centralized control over forwarding data. The compute plane interface, typically the L1
and configuration state of the software switches run- hypervisor, interacts with providers VMx machines
ning in the Supercloud VMs. OpenFlow and Open to provide memory and CPU resources.
vSwitch (OvS) technologies provide fine-grained We describe the interfaces between Supercloud
control of packet forwarding and of switch configu- elements in several scenarios. The first scenario re-
rations, respectively. Logical centralization of control lates to requesting data from the cloud storage; the
facilitates isolation, for example, through flow rule second relates to establishing communication be-
redefinition at the network edge, with translation of tween two VMs hosted in the Supercloud. The last
physical to virtual events. Availability goals extend example shows how the Supercloud security manage-
well-proven techniques to the multicloud setting. ment interfaces enable to deploy security services.
For each user, a specific set of network applica-
tions that control the virtual network will run on top Access to cloud storage. In this scenario (steps
the Supercloud network hypervisor that maps the af in Figure 4), during a request to the data layer
virtual and physical resources. These include an ad- (step a), the user VM (uVM) sends a request to the
a
Network
security
U-Cloud
Data
User 1 security u1VM1 u1VM2 u1VM3
Computing
security
management
Security b 2 6
management
User p c
CSP DVM1 NVM1 5
storage DVM1 NVM2 VM2 VM
Comp d
SDN 4
Comp cont.
Provider 2
Provider 1
f
L1 API API
Hypervisor LO Hypervisor LO
SDN e
infrastructure
Figure 4. Sample Supercloud workflows. Shown are the Supercloud interfaces (computation, data,
networking, self-management) to deploy applications in several scenarios: access to cloud storage,
establishing communications between VMs, and self-management of security.
data management VM (DVM) (step b). The DVM is face (see Figures 2 and 3) to deploy, orchestrate,
aware of the resources physical location inside the enforce, and monitor security requirements. Such
cloud infrastructure. It provides this information to requirements are specified and negotiated through
the hypervisor hosting the uVM (step c), which will SLAs during the cloud service discovery and bro-
ask the network management VM (NVM) (step d) to kering phases. This distributed protection plane is
establish a connection between resource and uVM realized through interplay of several security self-
through the SDN network (steps e and f). management components spread across the Super-
cloud abstraction planes.
Establishing communication between user VMs. The resource management components are self-
In this scenario (steps 16 in Figure 4), after re- management agents (SMAs) responsible for deliver-
ceiving a request from User 1 (step 1), uVM1 ing atomic security services such as enforcement,
sends a communication request through the hy- detection, reaction, and monitoring. These compo-
pervisor (step 2). The hypervisor forwards the nents operate on a particular architecture abstrac-
request to the NVM (step 3), which establishes tion plane and are dedicated to a specific security
the SDN rules for the path to communicate with service (such as intrusion detection, authorization
NVM2 (step 4). If the destination VM is hosted enforcement, or trust management). Some security
on a different CSP, the NVM forwards the request services might require multiple SMAs across mul-
to the NVM of the other CSP, hence setting up tiple planes and/or providers. For instance, intru-
the connection. Finally, NVM2 shares the request sion detection might require the collaboration of
with VM2 (step 5), which is hosting uVM2 (step multiple (cross-provider) SMAs to collect, aggregate,
6). Each component (VMx, DVMx, and NVMx) and process activity logs.12
is accessed independently of the provider owning Aggregation components provide a unified and
the physical resource. uniform view of multiple SMAs to the orchestrator.
They abstract the heterogeneity of provider security
Security management. Users and providers also mechanisms, meeting platform independence and
interact with a security management plane inter- interoperability requirements.
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 71
Cloud Security
Orchestration components are decision-making each hospital for its operations. Whenever a hospital
components providing security services. Each com- VM wants to store or retrieve clinical data (for exam-
ponent is a manager for a specific security service ple, MRI imaging data), it communicates with a pic-
such as authorization and access control, intrusion ture archiving and communication system (PACS)
detection and prevention, and trust management. VM interfacing with the data plane. The data plane
In addition, an overall orchestrator coordinates the provides an abstraction to the user VM, making all
actions of all security managers; a planner gener- underlying storage directly accessible (including en-
ates plans to reach and/or maintain security objec- crypting stored data). Data older than six months is
tives; and a storage manager guarantees persistence stored on the public cloud, whereas recent data is
and delivery of the knowledge needed for self-man- kept in the private clouds on-premises storage, pro-
agement of security. Orchestration components are viding instantaneous access to it. Here, the network
also responsible for retrieving user security require- plane is responsible for handling all communica-
ments from SLAs, converting them into policies tions across different clouds and VMs. Hospitals can
and configurations to be enforced, and detecting also define their security policies, such as how other
and managing conflicts between tenants, users, hospitals can access their data. Components that
and/or providers. are dedicated to data security and security manage-
ment across the L1 hypervisor and compute VMs
Use Cases will prevent any unprivileged access to data based
To illustrate how the Supercloud architecture can on security policies defined by each hospital.
be mapped to real-world use cases, we use examples
from the healthcare domain. Healthcare Laboratory Information System
This use-case demonstrates the impact of the Super-
Hospital Imaging Archive cloud architecture for Maxdata Software, a healthcare
The amount of diagnostic imaging data is quickly software vendor that aims to deploy its software on
increasing, imposing great challenges on hospital the cloud as SaaS while enforcing the security re-
archive infrastructures, which must ensure high quirements of different healthcare institutions.
data availability, security, and regulatory compli- The CLINIdATALIS healthcare laboratory in-
ance. A cloud-based solution can help address these formation system (LIS) is a cross-platform Web ap-
challenges. plication in which server components can run on any
Such a solutions architecture should minimize common operating system and relational database.
the risk of security breaches and privacy violations, The CLINIdATALIS must integrate with dozens
including unprivileged access to data (both at rest of other clinical and nonclinical information systems
and during processing) with regard to defined poli- (such as intensive care units, patient identification,
cies. These policies might include hospital-specific billing, and regional health portals). It includes a set
policies context, legal country boundaries, and user of real-time interfaces with physical electronic equip-
groups. In terms of performance, robust data pro- ment (automated analyzers). The solution consists of
cessing with low latency is desired, especially across three components on the server side: a stateless ap-
different clouds. plication, a database engine, and database data. The
Hospitals can store their clinical data as well as Supercloud approach allows each healthcare institu-
their imaging studies in on-premises private cloud tion to define the U-Cloud that best fits its needs.
storage. Archiving in the cloud helps simplify the Concrete deployment on physical cloud providers is
data management and hospital archive infrastruc- then automated. The considered setting is a large
tureespecially due to high-volume imaging studies hospital cluster that employs thousands of profes-
that are often as large as 1 Gbyte. Since on-premises sionals, processes tens of millions of transactions per
storage can be limited, it makes sense to store this day, and is located in a country where personal data
data securely in public cloud storage. For example, a protection must be guaranteed.
hospital might store data from the last six months in In a typical U-Cloud specification,
the private clouds on-premises storage, while stor-
ing older data (10 years or more) in the public cloud. the application and database engine are repli-
Figure 5a shows a sample Supercloud implementa- cated across several VMs on the compute plane
tion of such a solution. (fault tolerance and load balancing);
In Figure 5a, three hospitals (A, B, and C) share data is split among different storage nodes in
a private cloud to store and manage their clinical the data plane (offering confidentiality, even if
data. A VM on the compute plane is dedicated to one storage node is compromised);
Data security
Computing security Hospital A Hospital A Hospital C
Security management
management Data Data
server proxy PACS Client Client Client
Horizontal orchestration
L1 L1
Network Network
OvS security hypervisor OvS
VM VM VM VM VM VM VM
Comp Comp
Private cloud
Comp SDN Comp Comp
Public cloud
API OvS OvS OvS API USS OvS USS OvS USS OvS USS OvS
Hypervisor LO Hypervisor LO
(a)
Automated Automated
analyzer analyzer
U-Cloud A
CLINIdATALIS CLINIdATALIS
Security
management
Horizontal orchestration
L1 L1
Network Network
OvS security hypervisor OvS
VM VM VM VM VM VM VM
Comp Comp
Private cloud
API OvS OvS OvS API USS OvS USS OvS USS OvS USS OvS
Hypervisor LO Hypervisor LO
(b)
Figure 5. Supercloud practical deployments: (a) high-availability storage and disaster recovery, and
(b) healthcare laboratory information system. (USS: user-centric system service)
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 73
Cloud Security
a set of networks connect application VMs to novation Program, grant 644962) and by the Swiss
automated analyzers running on hospital prem- Secretariat for Education Research and Innovation
ises and to database engine VMs, which in turn (contract 15.0091). It is based on contributions from
are connected to storage nodes; the entire Supercloud consortium.
VMs on the compute plane ensure confidential-
ity, integrity, and 99.99 percent availability; References
storage nodes on the data plane ensure data in- 1. F. Bonomi et al., Fog Computing and Its Role in
tegrity and 99.99 percent availability; and the Internet of Things, Proc. 1st Workshop Mo-
data may be processed and stored only in a pre- bile Cloud Computing (MCC), 2012, pp. 1316.
defined set of countries. 2. F. Manco et al., The Case for the Superfluid
Cloud, Proc. 7th USENIX Workshop Hot Topics
As Figure 5b shows, a Supercloud infrastructure in Cloud Computing (HotCloud), 2015.
can then deploy the VMs on a trusted private cloud 3. L. Zheng et al., How to Bid the Cloud, Proc.
to ensure confidentiality on the compute plane, in- ACM Conf. Special Interest Group on Data
stantiate the storage nodes on a set of public cloud Comm. (SIGCOMM), 2015, pp. 7184.
providers running security mechanisms (such as 4. R. Los, D. Shackleford, and B. Sullivan, Notori-
encryption and secret sharing) to ensure confiden- ous Nine Cloud Computing Top Threats in 2013,
tiality, and connect the different components us- tech. report, Cloud Security Alliance, 2013.
ing virtual networks provided by the network plane. 5. D. Sgandurra and E. Lupu, Evolution of At-
Deployments consider the locations or countries tacks, Threat Models, and Solutions for Virtual-
specified by the healthcare institution. Replicated ized Systems, ACM Computing Surveys, vol. 48,
instances of the CLINIdATALIS application run no. 3, 2016, pp. 138.
on VMs on the compute plane. These instances then 6. D. Williams, H. Jamjoom, and H. Weatherspoon,
connect to the database engine running on a differ- Plug into the Supercloud, IEEE Internet Com-
ent VM linked with the data plane. puting, vol. 17, no. 2, 2013, pp. 2834.
In case of regulatory, economic, or other type of 7. A. Ludwig and S. Schmid, Distributed Cloud
change, healthcare institutions can update U-Cloud Market: Who Benefits from Specification Flex-
requirements and/or features. The Supercloud in- ibilities? ACM SIGMETRICS Performance
frastructure automatically redeploys the solution Evaluation Rev., vol. 43, no. 3, 2015, pp. 3841.
accordingly, enabling quick adaptation to context 8. K. Bernsmed et al., Thunder in the Clouds: Se-
changes. It also prevents vendor lock-in. curity Challenges and Solutions for Federated
Clouds, Proc. IEEE 4th Intl Conf. Cloud Com-
puting Technology and Science (CloudCom),
ere implementing the different compo- 2012, doi:10.1109/CloudCom.2012.6427547.
nents of the Supercloud architecture to 9. M. Ben-Yehuda et al., The Turtles Project: De-
gradually achieve integrated proof of concepts. sign and Implementation of Nested Virtualiza-
The solution is currently at an advanced stage of tion, Proc. 9th USENIX Conf. Operating Sys-
implementation. Several results are already avail- tems Design and Implementation (OSDI), vol. 10,
able (see https://Supercloud-project.eu/publications 2010, pp. 423436.
-deliverables). However, were still integrating the 10. P. Watson, Application Security through Feder-
various components. Preliminary performance re- ated Clouds, IEEE Cloud Computing, vol. 1, no.
sults have shown relatively modest overheads, giving 3, 2014, pp. 7680.
good indications about the potential for the solution 11. D. Kreutz et al., Software-Defined Networking:
(such as for network virtualization13). Our next step A Comprehensive Survey, Proc. IEEE, vol. 103,
is to validate the approach through testbed integra- no. 1, 2015, pp. 1476.
tion. Other foreseen application domains include 12. S.T. Zargar et al., DCDIDP: A Distributed, Col-
network function virtualization or smart home secu- laborative, and Data-Driven Intrusion Detection
rity. Results will be disseminated to promote open and Prevention Framework for Cloud Comput-
source cloud technologies and will be contributed to ing Environments, Proc. 7th Intl Conf. Col-
major standardization bodies. laborative Computing: Networking, Applications
and Worksharing (CollaborateCom), 2011, pp.
Acknowledgments 332341.
This work is supported by the European Union Su- 13. M. Alaluna, F. Ramos, and N. Ferreira Neves,
percloud Project (Horizon 2020 Research and In- (Literally) above the Clouds: Virtualizing the
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 75
Standards Now
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 77
Standards Now
For Internet of Things (IoT) and sensor-oriented mat standards. As a result, current cloud microser-
settings, as discussed in the previous issue on man- vice designs are burdened with a huge variety and
ufacturing, the Sensor Network Object Notation multiplicity of API definitions.
(SNON, www.snon.org) is a representation based on In previous columns, Ive referred to the API
JSON that includes some predefined fields that are directory maintained, for example, by the website
especially useful in dealing with sensor data. In ad- ProgrammableWeb.com, which at the time of this
dition, the Data Distribution Service (DDS, http:// writing maintains a directory of more than 15,000
www.omg.org/spec/DDS) and DDS Data Local Re- APIs (www.programmableweb.com/apis/directory).
construction Layer (DDS-DLRL) specifications This situation requires APIs to be designed to work
were developed by the Object Management Group either in small subsets of the application arena in
specifically to handle data interchange tasks related which either the API is stable, or to be built to a
to IoT systems. common self-describing or standardized pattern.
Examples of effective API standards
are the RESTful API Markup Language
(RAML, http://raml.org) and Swagger,
which has evolved into the Open API
General data standards are available Initiative (https://openapis.org), as dis-
to deal with the wide variety of cussed in previous columns.
Se p t e m b e r / Oc t o b e r 2 0 1 6 I EEE Clo u d Co m p u t i n g 79
Standards Now
The discussion here has focused on tocol, IETF RFC 4960, 2007; www.rfc-editor.org/
the design and architecture of mi- info/rfc4960.
croservices. Ive covered considerations related 7. Z. Shelby, K. Hartke, and C. Bormann, The Con-
to packaging and delivery of microservices in contain- strained Application Protocol (CoAP), IETF RFC
ers, data exchange, and data formats, messaging and 7252, 2014; www.rfc-editor.org/info/rfc7252.
networking, focusing on some up-to-date topics on 8. P. Saint-Andre, Extensible Messaging and Pres-
standards related to these areas. ence Protocol (XMPP): Core, IETF RFC 6120,
My next column will address topics related to 2011; www.rfc-editor.org/info/rfc6120.
microservices orchestration, including relevant stan- 9. P. Saint-Andre, Extensible Messaging and Presence
dards such as Topology and Orchestration Speci- Protocol (XMPP): Instant Messaging and Presence,
fication for Cloud Applications (Tosca) and Cloud IETF RFC 6121, 2011; www.rfc-editor.org/info/
Application Management for Platforms (CAMP); rfc6121.
microservices control, including the Open Cloud 10. P. Saint-Andre, Extensible Messaging and Pres-
Computing Interface (OCCI) and Cloud Infrastruc- ence Protocol (XMPP): Address Format, IETF RFC
ture Management Interface (CIMI) standard sets; and 7622, 2015; www.rfc-editor.org/info/rfc7622.
serverless microservices, such as Amazon Lambda 11. L. Stout, ed., An Extensible Messaging and Pres-
and related concepts. Ill also take another look at the ence Protocol (XMPP) Subprotocol for WebSock-
SOA basis for microservice architectures to tie both et, IETF RFC 7395, 2014; www.rfc-editor.org/
of these columns together. info/rfc7395.
As always, this discussion only represents my 12. Information technologyAdvanced Message
own viewpoint. Id like to hear your opinions and Queuing Protocol (AMQP), Intl Organization for
experience in this area. Im sure other readers of the Standardization/Intl Electrotechnical Commis-
magazine would also appreciate additional informa- sion, ISO/IEC 19464, v.1.0, 2014; www.iso.org/
tion on this topic. iso/home/store/catalogue_tc/catalogue_detail.
Please respond with your input on this or previ- htm?csnumber=64955.
ous columns. Please include news you think the com- 13. A. Karmel, R. Chadromouli, and M. Iorga, NIST
munity should know in the general areas of cloud Definition of Microservices, Application Contain-
standards, compliance, or related topics. Im happy to ers and System Virtual Machines, Natl Inst. of
review ideas for potential submissions to the maga- Standards and Technology (NIST) Special Publica-
zine or for proposed guest columns. I can be reached tion 800-180, 2016; http://csrc.nist.gov/publications/
for this purpose at alan.sill@standards-now.org. drafts/800-180/sp800-180_draft.pdf.
References
1. Ecma International, The JSON Data Interchange Alan Sill directs the US National Science Founda-
Format, Ecma-404, 1st ed. 2013; www.ecma tions Cloud and Autonomic Computing industry/uni-
-international.org/publications/standards/Ecma versity cooperative research center. Hes interim senior
-404.htm. director of the High Performance Computing Center
2. T. Bray, ed., The JavaScript Object Notation (JSON) and adjunct professor of physics at Texas Tech Univer-
Data Interchange Format, IEEE RFC 7159, 2014; sity, and visiting professor of distributed computing at
https://www.rfc-editor.org/info/rfc7159. the University of Derby. Sill has a PhD in particle phys-
3. M. Duke, et al., A Roadmap for Transmission Con- ics from American University. Hes an active member of
trol Protocol (TCP) Specification Documents, IETF IEEE, the Distributed Management Task Force, and the
RFC 7414, 2015; www.rfc-editor.org/info/rfc7414. TeleManagement Forum, and he serves as president for
4. A. Sill, Standards Underlying Cloud Networking, the Open Grid Forum. Hes a member of several cloud
IEEE Cloud Computing, vol. 3, no. 3, 2016, pp. 7680. standards working groups and national and interna-
5. J. Postel, User Datagram Protocol, IETF RFC 768, tional standards roadmap committees, and he remains
1980; www.rfc-editor.org/info/rfc768. active in particle physics and advanced computing re-
6. R. Stewart, ed., Stream Control Transmission Pro- search. Contact him at alan.sill@standards-now.org.
Figure 1. Comparison of cloud architectures: (a) hypervisor-based application deployment, (b) hypervisor-free containerized
microservice, and (c) containerized microservice within a hypervisor-managed physical cloud hardware.
82 I E E E C l o u d Co m p u t i n g w w w . c o m p u t e r . o r g / c l o u d c o m p u t i n g
vides native clustering for Docker containers. It network abstractions, such as virtual L1 and L2
turns a pool of Docker hosts into a single virtual overlays and security groups. OVN also supports the
Docker host. Because Docker Swarm serves the security inspection of data transfer inside virtual
standard Docker API, any tool that already com- networks (for example, packet inspection); hence it
municates with a Docker daemon can use Swarm provides extra features useful for increasing custom-
to transparently scale to multiple hosts. A Docker er security and privacy.
container manager represents the basic container-
oriented technology. Open Issues in Scheduling and Resource
Kubernetes is an open-source technology for Management
automating deployment, operations, and scaling of Despite the clear technological advances in con-
containerized applications. It groups the containers tainer and hypervisor-based virtualization technol-
making up an application into logical units for easy ogies, we are yet to realize a standard large-scale,
management and discoveryfor example, based on performance-optimized scheduling platform for man-
their resource requirements and other constraints. aging an ecosystem of microservices networked to-
Kubernetes also provides horizontal scaling of ap- gether to create a specialized application stack, such
plications, which can be performed manually or as a multitier Web application and Internet of Things
automatically based on CPU load. Finally, it pro- (IoT) application. Future efforts will focus on solv-
vides automated rollouts and rollbacks and self- ing the following research challenges.
healing features.
Magnum is the OpenStack API service that Configuration Selection and Management
makes container orchestration engines such as A cloud application (for example, a multitier Web
Docker Swarm and Kubernetes available as first-class application) must typically combine multiple inter-
resources in the OpenStack managed datacenter. dependent microservices that provide diverse func-
Magnum uses the Heat service to schedule an operat- tionalitiesfor example, load balancer, webserver,
ing system image, which contains Docker and Kuber- and database server. Moreover, these microservices
netes, and runs this image on either VMs or a bare have both control and dataflow dependencies.
metal cluster. The challenges exist in dealing with heteroge-
The Google Container Engine provides a com- neous configurations of microservices and cloud
mercial service that relies on Docker and Kuber- datacenter resources driven by heterogeneous
netes for cluster management and orchestration. performance requirements. With the increase in
Similarly, the Amazon Elastic Compute Cloud microservice application functionality types (en-
(EC2) container service supports Docker containers cryption, compression, SQL/NSQL server, virtual
to be deployed on a managed cluster of Amazon EC2 private network, and so on) and the heterogeneity
instances. Rackspace is slightly behind with respect of container engines (LXC, Docker, Google, and
to container-based offerings. Its beta service, Cari- Amazon) and underlying cloud datacenter resourc-
na, is based on Docker Swarm and doesnt provide es, the mapping of microservices to datacenters
any elasticity features. demands selecting bespoke configurations from an
With regard to networking containerized mi- abundance of possibilities,5 which is impossible to
croservices, OpenStack Neutron supports the man- resolve manually.
agement of virtual LANs in cloud datacenters by Branded price calculators, available from pub-
creating ad hoc NFV. NFV uses virtualization tech- lic cloud providers (Amazon and Azure, for ex-
nologies to manage core networking functions via ample) and academic projects (Cloudrado), allow
software instead of relying on hardware to handle comparison of datacenter resource leasing costs.
these functions. However, these calculators cant recommend or
Creating NFVs using Open Virtual Network compare configurations across microservices and
(OVN) technology guarantees an efficient and se- datacenter resources.
cure use of the network. OVN complements existing We therefore need new research that focuses
SDN capabilities, adding native support for virtual on developing techniques for accurately modeling,
84 I E E E C l o u d Co m p u t i n g w w w . c o m p u t e r . o r g / c l o u d c o m p u t i n g
same container or on the same physical host; live Elastic Scheduling and Runtime Adaptation
migration of containers to reduce interference and The elastic scheduling of microservices is a com-
contention; and tradeoffs between live migration plex research problem due to several runtime
and restarting. uncertainties.
First, its difficult to estimate microservice work-
Microservice Monitoring load behavior in terms of request arrival rate,
Guaranteed application performance requires clear type, and processing time distributions; I/O sys-
and real-time understanding of performance met- tem behavior; and number of users connecting to
rics across microservices and datacenter resources. different types and mix of microservices. The real
However, variations in performance metrics across challenge in devising microservice-specific work-
different microservices and datacenter resources load models is to accurately learn and fit statisti-
complicate this problem. For example, key perfor- cal functions to the monitored distributions such
mance metrics for SDN resources are throughput as request arrival pattern, CPU usage patterns,
and latency; for CPU resources, theyre utilization memory usage patterns, I/O system behaviors, re-
and throughput; and for SQL and NoSQL da- quest processing time distributions, and network
tabase microservices, its query response time. usage patterns.
Therefore, how to define and formulate perfor- Without knowing the workload behaviors of
mance metrics coherently across microservices to microservices, its difficult to make decisions about
give a holistic view of data and control flows remains the types and scale of datacenter resources to be
an open issue. provisioned to microservices at any given time.
Monitoring tools that were popular in the grid Furthermore, the availability, load, and throughput
and cluster computing era (for example, R-GMA of datacenter resources can vary in unpredictable
and Hawkeye) were concerned only with moni- ways, due to failure or congestion of network links.
toring performance metrics at the datacenter re- Kubernetes offers a microservice container re-
source level (such as CPU percentage and TCP/ configuration feature, which scales by observing
IP performance), but not at the microservice level CPU usage (elasticity is agnostic to the workload
(such as end-to-end request processing latency and behavior and performance objectives of microser-
communication overhead). Cluster-wide monitor- vices). Amazons autoscaling service employs simple
ing frameworks (Nagios, Ganglia, Apache Ha- threshold-based rules or scheduled actions based
doop, and Apache Spark) provide information on a timetable to regulate infrastructural resources
about hardware metrics (cluster, CPU, and mem- (for example, if the average CPU usage is above 40
ory utilization, and so on) of cluster resources percent, add another microservice container). Other
that might belong to public or private cloud data- cloud providers have implemented similar simple
center.12,13 Monitoring frameworks used by the rule-based reactive runtime scheduling techniques:
Amazon EC2 Container Service (Amazon Cloud- Googles Cloud Platform autoscaler, Rackspaces
Watch) and Kubernetes (Heapster) typically moni- Auto Scale, Microsoft Azures Fabric Controller, and
tor CPU, memory, filesystem, and network usage IBMs Softlayer autoscale.
statistics, so they cant monitor microservice-level To the best of our knowledge, no prior work
performance metrics. has developed workload and resource performance
This leads to several new research topics, in- prediction models to enable reconfiguration (scal-
cluding development of holistic techniques13 for ing, descaling, and migration) of microservices on
collecting and integrating monitoring data from all cloud datacenters while ensuring microservice-
microservices and datacenter resources so admin- specific performance objectives. Hence, important
istrators or a scheduler (a computer program) can new research is investigating predictive workload
track and understand the impact of runtime uncer- and performance models to forecast workload
tainities (for example, failure, load-balancing ef- input and performance metrics across multiple,
ficiency, and overloading) on performance without coexisting microservices deployed on cloud data-
understanding the whole platforms complexity. center resources.
Federated Clouds
Storage and processing services
The cloud services market has been growing in re-
C1 C2 cent years, a trend thats confirmed by the number
S1 S2 S3 ... Sn of cloud providers that have appeared on the market.
VM1 VM2 VM3 VM4 Currently, small and medium cloud providers cant
directly compete with the big players (such as Google,
IoT cloud provider
Amazon, and Microsoft), so they must implement
new business strategies to penetrate the market.16,17
...
SA1 SA2 SA3 SAm In particular, small and medium providers can
establish stronger partnerships to share resources
C1 C2 Sensing and actuating services
according to the rules of the cloud federation eco-
system they belong to. Small providers can federate
C3 C4 C5 with large providers to gain economies of scale, op-
timize their assets, scale their capabilities, and share
resources to establish new forms of collaboration. If
Figure 2. A microservice as the enabler for the IoT application cloud. a small providers cloud runs out of capacity, it can
IoT application are decomposed into collection of microservices which migrate its microservices to federated datacenters to
are distributed across physical hardware resources available in the cloud ensure business continuity (see Figure 3).
and on the network edge. However, federated clouds need to respond to
high heterogeneity across independent cloud systems,
efficient and secure data exchange among clouds, and
Evolution of Microservice-Powered Cloud the ability to efficiently deploy resources and services
Paradigms across such federated systems. Indeed, the dynamism
Wide-scale adoption of containerization technolo- of a federation with incoming and outgoing providers
gies and microservices architectures will strongly and variable resource availability makes microser-
influence other emerging computing paradigms. vices and containers the best solution to quickly
adapt to changes in the federated system.
Cloud Computing and Internet of Things
The combination of cloud computing and the IoT
is presenting new opportunities for delivering new icroservices will simplify orchestration of
types of application services (see Figure 2). For ex- networked applications across heterogeneous
ample, private, public, and hybrid cloud providers cloud datacenters and emerging microdatacenters
are looking to integrate their datacenters software (on the network edge). However, the creation of
and hardware stacks with embedded devices (in- such applications (for example, smart city and smart
cluding sensors and actuators) to provide IoT as a healthcare IoT clouds) requires new research into
service (IoTaaS). scheduling and resource management algorithms
Typically, IoT devices run customized soft- and platforms for managing highly distributed and
ware developed with a particular programming networked microservices.
language and/or development framework. Minimal
processing and storage tasks can be performed in References
IoT devices (for example, a sensor gateway or SDN 1. A Sill, The Design and Architecture of Mi-
virtualization) by deploying lightweight, contain- croservices, IEEE Cloud Computing, vol. 3, no.
erized microservices.14,15 Meanwhile, the massive 5, 2016, pp. 7680.
data storage and processing tasks (data mining and 2. C. Pahl and B. Lee, Containers and Clusters
big data analytics) are performed in cloud datacen- for Edge Cloud Architectures: A Technology
ters that exploit virtualization (both hypervisor and Review, Proc. 3rd Intl Conf. Future Internet of
container-based) to elastically scale up/down storage Things and Cloud (FiCloud), 2015, pp. 379386.
and processing capabilities. 3. M. Xavier et al., Performance Evaluation of
86 I E E E C l o u d Co m p u t i n g w w w . c o m p u t e r . o r g / c l o u d c o m p u t i n g
Cloud Home cloud services (IaaS, PaaS, SaaS)
User
Foreign Foreign
Cloud Home cloud
cloud A cloud B
federation
Home cloud
Virtual resources used by Virtual resources owned by capabilities Virtual resources used by
foreign cloud A and placed home cloud and placed in its enlargement foreign cloud B and placed
in its virtual infrastructure virtual infrastructure in its virtual infrastructure
Virtual resources Virtual resources
placed in foreign cloud A placed in foreign cloud B
and rented to home cloud and rented to home cloud
Figure 3. Microservice as the basis of federating multiple cloud datacenters as part of cohesive federation,
where datacenter providers can meet the performance requirements of client applications through optimal
placement and migration of microservices across datacenters.
Container-Based Virtualization for High Per- 8. M.K. Qureshi and Y.N. Patt, Utility-Based
formance Computing Environments, Proc. Cache Partitioning: A Low-Overhead, High-
21st Euromicro Intl Conf. Parallel, Distributed, Performance, Runtime Mechanism to Partition
and Network-Based Processing (PDP), 2013, pp. Shared Caches, Proc. 39th Ann. IEEE/ACM
233240. Intl Symp. Microarchitecture (Micro 06), 2006,
4. C. Esposito, A. Castiglione, and K.-K.R. Choo, pp. 423432.
Challenges in Delivering Software in the Cloud 9. Y. Xie and G.H. Loh, Pipp: Promotion/Inser-
as Microservices, IEEE Cloud Computing, Vol. tion Pseudo-Partitioning of Multi-Core Shared
3, no. 5, 2016, pp. 1014. Caches, Proc. 36th Ann. Intl Symp. Computer
5. R. Ranjan et al., Cross-Layer Cloud Resource Architecture (ISCA 09), 2009, pp. 174183.
Configuration Selection in the Big Data Era, 10. S. Govindan et al., Cuanta: Quantifying Ef-
IEEE Cloud Computing, vol. 2, no. 3, 2015, pp. fects of Shared On-Chip Resource Interference
1622. for Consolidated Virtual Machines, Proc. 2nd
6. M. Caballer et al., Dynamic Management of ACM Symp. Cloud Computing (SOCC 11), 2011,
Virtual Infrastructures, J. Grid Computing, vol. article 22.
13, Mar. 2015, pp. 5370. 11. R. Nathuji and A. Kansal, Q-Clouds: Manag-
7. W. Felter et al., An Updated Performance Com- ing Performance Interference Effects for QoS-
parison of Virtual Machines and Linux Contain- Aware Clouds, Proc. 5th European Conf. Com-
ers, Proc. IEEE Intl Symp. Performance Analysis of puter Systems (EuroSys 10), 2010, pp. 237250.
Systems and Software (ISPASS), 2015, pp. 171172. 12. R. Ranjan, Streaming Big Data Processing in
Datacenter Clouds, IEEE Cloud Computing, interests include grid computing, peer-to-peer net-
vol. 1, no. 1, 2014, pp. 7883. works, cloud computing, Internet of Things, and big
13. M. Natu et al., Holistic Performance Monitor- data analytics. Ranjan has a PhD in computer science
ing of Hybrid Clouds: Complexities and Future and software engineering from the University of Mel-
Directions, IEEE Cloud Computing, vol. 3, no. bourne (2009). Contact him at raj.ranjan@ncl.ac.uk
1, 2016, pp. 7281. or http://rajivranjan.net.
14. A. Celesti et al., Exploring Container Virtu-
alization in IoT Clouds, Proc. 2016 IEEE Intl Chang Liu is a research fellow (assistant professor)
Conf. Smart Computing (SmartComp), 2016, pp. at Newcastle University, UK. His research interests in-
16. clude cloud computing, big data, distributed systems,
15. M. Fazio and A. Puliafito, Cloud4sens: A Cloud- Internet of Things, and information security and pri-
Based Architecture for Sensor Controlling and vacy. Liu has a PhD in information technology from
Monitoring, IEEE Comm, vol. 53, Mar. 2015, the University of Technology, Sydney, Australia. Con-
pp. 4147. tact him at changliu.it@gmail.com.
16. M. Assis and L. Bittencourt, A Survey on Cloud
Federation Architectures: Identifying Function- Lydia Y. Chen is a research staff member at the
al and Non-functional Properties, J. Network IBM Zurich Research Lab, Zurich, Switzerland. Her
and Computer Applications, vol. 72, 2016, pp. research interests include modeling, optimizing per-
5171. formance and dependability for big data applica-
17. A. Celesti et al., Characterizing Cloud Fed- tions and highly virtualized datacenters. She received
eration in IoT, Proc. 30th Intl Conf. Advanced a PhD in operations research from the Pennsylvania
Information Networking and Applications Work- State University. Contact her at yic@zurich.ibm.com.
shops (WAINA), 2016, pp. 9398.
Massimo Villari is an associate professor of
computer science at the University of Messina. His re-
Maria Fazia is an assistant researcher of computer search interests include cloud computing, Internet of
science at the University of Messina. Her research in- Things, big data analytics, and security systems. Vil-
terests include distributed systems and wireless com- lari has a PhD in computer engineeringfrom the Uni-
munications, especially with regard to the design and versity of Messina. Hes a member of IEEE and IARIA
development of cloud solutions for IoT services and boards. Contact him at mvillari@unime.it.
applications. Fazia has a PhD in advanced technolo-
gies for information engineering from the University
of Messina. Contact her at mfazio@unime.it.
88 I E E E C l o u d Co m p u t i n g w w w . c o m p u t e r . o r g / c l o u d c o m p u t i n g