Professional Documents
Culture Documents
CS8791 - Cloud Computing
CS8791 - Cloud Computing
Cloud computing
Dr. Bhushan Jadhav
Ph.D. Computer Engineering
Assistant Professor, Information Technology Department,
Thadomal Shahani Engineering College,
Bandra, Mumbai.
Sonali Jadhav
M.E. Computer Engineering
Assistant Professor, Computer Engineering Department,
D. J. Sanghvi College of Engineering,
Mumbai.
® ®
TECHNICAL
PUBLICATIONS
SINCE 1993 An Up-Thrust for Knowledge
(i)
Cloud Computing
Subject Code : CS8791
Published by :
® ®
Printer :
Yogiraj Printers & Binders
Sr.No. 10\1A,
Ghule Industrial Estate, Nanded Village Road,
Tal-Haveli, Dist-Pune - 411041.
ISBN 978-93-90041-22-0
9 789390 041220 AU 17
Contents
1.1 Introduction to Cloud Computing
1.2 Definition of Cloud Computing
1.3 Evolution of Cloud Computing
1.4 Underlying Principles of Parallel and Distributed Computing
1.5 Cloud Characteristics
1.6 Elasticity in Cloud
1.7 On-demand Provisioning
1.8 Challenges in Cloud Computing
(1 - 1)
Cloud Computing 1-2 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1-3 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1-4 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1-5 Introduction
The limitation of peer to peer architecture is it incurs additional capital cost for the
implementation, generates too many request/reply messages that makes congestion in
the network and difficult to manage the traffic flow.
The server is responsible for handling the resource sharing, task processing and
communications between the clients. So, clients have to rely on server for various access
and services. It is faster than peer to peer architecture as server is responsible for granting
and denying permission for access. The generalized Client-server architecture is shown in
Figure 1.1.3.
The client server architecture has centralized processing which makes faster
communication with good performance. The cloud computing also follows Client server
architecture but massive in size that gives seamless delivery of services with flexibility,
scalability, mobility at lower cost.
c) Grid Computing
The Grid Computing architecture has geographically distributed computing resources
which work together to perform a common task. A typical grid has pool of loosely
coupled computers who worked together to solve a complex computational problem. It
has heterogeneous resources which are controlled by a common node called control node
like client server architecture.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1-6 Introduction
Conceptually grid computing works similar to cloud computing for providing services
through a shared pool of resources. The grid computing follows a distributed architecture
while cloud computing follows centralized distributed architecture. In Grid computing
the compute resources are distributed across cities, countries, and continents. Therefore,
they are managed completely in a distributed manner while in cloud computing although
resources are distributed but they are managed centrally. The Cloud computing is
advantageous over Grid computing in terms of availability, scalability, flexibility, disaster
recovery and load balancing.
d) Distributed computing
It is a computing concept that refers to multiple computer systems working on a single
problem. In distributed computing, a single problem is divided into many parts, and each
part is executed by different computers. As long as the computers are networked, they
can communicate with each other to solve the problem. If it is done properly, the
computers perform like a single entity. The ultimate goal of distributed computing is to
maximize performance by connecting users and IT resources in a cost-effective,
transparent and reliable manner. This type of computing is highly scalable.
e) Cluster computing
Cluster computing is also intended to solve a complex computational problem in a
group of computers connected through a network. Generally, the cluster is a collection of
interconnected loosely coupled homogenous computers that work together closely, so
that in some respects they can be regarded as a single computer. Each cluster is composed
of multiple standalone machines connected by a network. The modern clusters are
typically designed to handle more difficult problems that require nodes to share
intermediate results with each other very often that require a high bandwidth and a low-
latency interconnection network.
The cluster computing has loosely coupled computers where the local operating
system of each computer manages its resources. Therefore, cluster server needs to merge
multiple system images into single system image to support sharing of CPUs, memories
and IOs across cluster nodes. The single system image (SSI) can be formed only with the
help of middleware that makes clusters appears like a single machine to the user. Without
middleware clusters cannot work efficiently to achieve cooperative computing.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1-7 Introduction
Definition :
The Cloud computing is a model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service providers interaction.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 10 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 11 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 12 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 13 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 14 Introduction
Each part is further broken down into a series of instructions which execute
simultaneously on different processors using overall control/coordination mechanism.
Here, the different processors share the work-load which results in producing the much
higher computing power and performance than could not be achieved with traditional
single processor system.
The parallel computing often correlated with parallel processing and parallel
programming. Processing of multiple tasks and subtasks simultaneously on multiple
processors is called parallel processing while parallel programming refers to
programming on a multiprocessor system using the divide-and-conquer technique, where
given task is divided into subtasks and each subtask is processed on different processors.
In parallel processing, CPU is the core component responsible for executing the tasks
and subtasks in the programs. Each program consists of different streams, called
instructions streams and data streams. These instruction streams and data streams are
observed by the CPU during program execution. Therefor the hardware architecture for
parallel computers is characterized by Flynn’s classification method. Flynn characterized
the parallel computers in terms of the number of instruction streams over the data
streams. A flow of operands (data) between the processor and memory is called data
stream while flow of instructions is called as instruction stream. Flynn’s classification
depends upon the number of streams following at any point of execution. The basic
classification stated by Flynn is shown in Figure 1.4.3.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 15 Introduction
This type of computers performs sequential computing and give low performance.
Examples of SISD are older generation computers and computers with single non-
pipelined processor.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 16 Introduction
Each processing element takes the data from its own memory. It is best suited for
specialized problems characterized by a high degree of regularity, such as
graphics/image processing. It has synchronous (lockstep) and deterministic execution. It
uses parallel architectures like array processors and vector pipelines. Most of the modern
computers employ SIMD architecture as shown in Fig. 1.4.5. Examples of SIMD
organization are ILLIAC-IV, PEPE, BSP, STARAN, MPP, DAP and the Connection
Machine (CM-1).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 17 Introduction
As MIMD computers are able to run independent programs, many tasks can be
performed at the same time. The execution in MIMD can be synchronous or
asynchronous, deterministic or non-deterministic. In the real sense MIMD architecture is
said to be a parallel computer. Examples of MIMD are most current multicore computers,
multiprocessor computers, networked computer clusters and supercomputers.
An important characteristic of shared memory architecture is that there are more than
one processor and all processors share same memory with global address space. In this,
the processors operate independently and share same memory resources. Changes in a
memory location done by one processor are visible to all other processors. Based upon
memory access time, the shared memory is further classified into uniform memory access
(UMA) architecture and non-uniform memory access (NUMA) architecture which are
discussed as follows :
1. Uniform memory access (UMA) : An UMA architecture comprises two or more
processors with identical characteristics. The UMA architectures are also called as
symmetric multiprocessors. The processors share the same memory and are
interconnected by bus-shared interconnection scheme such that the memory access
time is almost same. The IBM S/390 is an example of UMA architecture which is
shown in Fig. 1.4.8 (a).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 18 Introduction
Fig. 1.4.8
In the distributed memory system, the concept of global memory is not used as each
processor uses its own internal (local) memory for computing.
Therefore, changes made by one processor in its local memory have no effect on the
memory of other processors, and memory addresses in one processor cannot be mapped
with other processors. Distributed memory systems require a communication network to
connect inter-processor memory, as shown in Fig. 1.4.9. The distributed memory
architecture is also called as message passing architecture. The speed and performance of
this type of architecture depends upon the way the processors are connected.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 19 Introduction
The term distributed computing encompasses any architecture or system that allows
the computation to be broken down into units and executed concurrently on different
computing elements. It is a computing concept that refers to multiple computer systems
connected in a network working on a single problem. In distributed computing, a single
problem is divided into many parts, and each part is executed by different computers. As
long as the computers are networked, they can communicate with each other to solve the
problem. If it is done properly, the computers perform like a single entity. The ultimate
goal of distributed computing is to maximize performance by connecting users and IT
resources in a cost-effective, transparent and reliable manner. This type of computing is
highly scalable. The Conceptual view of distributed system is shown in Fig. 1.4.10.
1.4.3.1 Architectural Models for Distributed System
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 20 Introduction
they are mapped onto the underlying network of computers. There are mainly two
architectural models for distributed system namely client-server model and the peer-to-
peer (P2P) model. The architecture of client-server model is shown in Fig. 1.4.11 (a) and
peer-to-peer is shown in Fig. 1.4.11 (b).
Fig. 1.4.11
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 21 Introduction
The Web services are loosely coupled (platform independent), contractual components
that communicate in XML-based (open standard) interfaces. The Web service composed
of set of operations that can be invoked by leveraging Internet-based protocols. It
provides method for supporting operations with parameters and return their values with
complex and simple types. The semantics for Web services are expressed through
interoperable XML based protocol called SOAP (Simple Object Access Protocol). SOAP is
the communication protocol used in web services with request-reply primitives. The
services in web services are defined in a standardize XML document called WSDL (Web
Service Description Language) which expresses simple and complex types in a platform-
independent manner. In web services, the UDDI (Universal Descriptor Discovery and
Integration) registry is used for registering the objects called by consumers and published
by the providers.
The Web services and Web 2.0 are the fundamental building blocks for cloud
computing. The front end of recent cloud platform is mostly built using Web 2.0 and
related technologies while services of cloud are delivered through Web services or SOA
based technologies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 23 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 24 Introduction
On-demand self-service
Resource pooling
Rapid elasticity
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 28 Introduction
Vendor lock-in
Service Quality
Data Protection
Management Capabilities
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 29 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 30 Introduction
provided through internet, they are available on-demand everywhere through a self-
service portal.
Q.6 Differentiate between Grid and Cloud Computing. SPPU : Dec.-17
Ans. :
Feature Grid Computing Cloud Computing
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 31 Introduction
flexibility of access to the cloud data allow employees to work from home or on
holiday.
Sharing of resources and costs : The cloud computing fulfills the requirement of
users to access resources through a shared pool which can be scaled easily and
rapidly to any size. The sharing of resources saves huge cost and makes efficient
utilization of the infrastructure.
Minimize spending on technology infrastructure : As public cloud services are
readily available and the Pay as you go feature of cloud allows you to access the
cloud services economically. Therefore, it reduces the spending on in-house
infrastructure.
Maintenance is easier : As cloud computing services are provided by service
provider through internet, the maintenance of services is easier and managed by
cloud service providers itself.
Less Capital Expenditure : There is no need to spend big money on hardware,
software or licensing fees so capital expenditure is very less.
On-demand self-service : The cloud provides automated provisioning of services on
demand through self-service websites called portals.
Server Consolidation : The increased resource utilization and reduction in power
and cooling requirements achieved by server consolidation are now being expanded
into the cloud. Server consolidation is an effective approach to maximize resource
utilization while minimizing energy consumption in a cloud computing
environment.
Energy Resource Management : Significant saving in the energy of a cloud data
center without sacrificing SLA are an excellent economic incentive for data center
operators and would also make a significant contribution to greater environmental
sustainability.
Q.8 Enlist any two advantages of distributed systems. SPPU : Dec.-18
Ans. : Advantages of Distributed System are
Supports heterogeneous hardware and software.
The resources shared in the distributed system are easily accessible to the users
across the network.
The distributed system is scalable such a way that if the number of users or
computers increases, the performance of the system does not get affected.
It is capable to detect and recover from failure, that is, it should be fault tolerant and
robust.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 32 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 33 Introduction
Q.1 Illustrate the evolution of distributed computing to grid and cloud computing.
SPPU : Dec.-19
Ans. : Evolution of distributed computing to grid and cloud computing
At the initial stage of computing standalone computers were used to solve large
complex tasks in a sequential manner called serial computing. In serial computing,
large problems were divided in to number of smaller tasks which were solved serially
or sequentially on standalone computers. The limitations of serial computing were
slower computing performance, low transmission speed and Hardware limitations.
Therefore, serial computing approach was evolved with centralized computing where
centralized server is used for computation.
The centralized computing is a type of computing architecture where all or most of
the processing/computing is performed on a central server. In this type, all the
computing resources are interconnected in a centralized manner to a single physical
system called server. The resources like processors, Memories and storages are shared
under single integrated operating system. But this technique limits when a centralized
server bottlenecks and has a single point of failure. Therefore, a parallel and
distributed computing approach come into the picture where multiple networked
computers are used to solve large scale problem.
In parallel computing, a complex and large-scale problem is broken into discrete
parts that can be solved concurrently. Each part is further broken down into a series of
instructions which are executed simultaneously on different processors. The execution
time is highly reduced in parallel computing as compared to serial computing because
of parallel execution. As multiple processors and memories are involved in the parallel
execution, management of memory addresses and processors address space is quite
difficult.
The distributed computing is evolved with the evolution of network where
multiple computers interconnected by the network are used to solve a complex
problem. It is opposite to centralized computing. It has a collection of independent
computers interconnected by the network which is used for executing high
computational job and appears to their users as a single coherent system. Like parallel
computing, a large problem is split into multiple tasks and each task is given to each
computer in a group for execution. Each computer in a group is equipped with an
independent processor, a local memory, and interfaces. The Communication between
any pair of the node is handled by message passing as no common memory is
available. The main advantage distributed computing is location independency as
multiple computers in a group with different geographic locations are used to solve a
large problem. The distributed computing then evolved with Grid computing.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 34 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 35 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 1 - 36 Introduction
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
2 Cloud Enabling Technologies
Syllabus
Service Oriented Architecture - REST and Systems of Systems - Web Services - Publish -
Subscribe Model - Basics of Virtualization - Types of Virtualization - Implementation Levels of
Virtualization - Virtualization Structures - Tools and Mechanisms - Virtualization of CPU -
Memory - I/O Devices - Virtualization Support and Disaster Recovery.
Contents
2.1 Service Oriented Architecture
2.2 REST and Systems of Systems
2.3 Web Services
2.4 Publish-Subscribe Model
2.5 Basics of Virtualization
2.6 Types of Virtualization
2.7 Implementation Levels of Virtualization
2.8 Virtualization Structures
2.9 Virtualization Tools and Mechanisms
2.10 Virtualization of CPU
2.11 Virtualization of Memory
2.12 Virtualization of I/O Device
2.13 Virtualization Support and Disaster Recovery
(2 - 1)
Cloud Computing 2-2 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-3 Cloud Enabling Technologies
protocols and technologies like HTTP and XML. It is identified with early efforts on the
architectural style of distributed systems, especially Representational State Transfer
(REST). These days, REST still gives an option in contrast to the complex standard-driven
web services technology and is utilized in many Web 2.0 services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-4 Cloud Enabling Technologies
The Middleware like Enterprise Service Bus (ESB) provides an infrastructure for
integrating legacy applications and provide services for message translation,
message transformation, protocol conversion, message routing with QoS and
security services. The typical SOA architecture is shown in Fig. 2.1.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-5 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-6 Cloud Enabling Technologies
c) Self-Descriptive Messages
A REST message contains brief description about message communication along
with the processing information. It enables intermediate users to process the
message without parsing the contents. The REST decouples the resources from their
representations such that their content can be accessed in a variety of standard
formats like HTML, XML, etc. It also provides the alternate representations of each
resource in multiple formats. The message also contains metadata that can be used
for detecting the transmission error, caching control, authentication, authorization,
and access control.
d) Stateless Communications
In REST, the communication happens are mostly ‘stateless’ where messages do not
have to rely on the state of the conversation. The stateless communication facilitates
improved visibility, task of recovering from partial failures, and increased
scalability. The limitations of stateless communication are degraded or decreased
network performance because of collective repeated data. However, there are some
communications happened using Stateful interactions which performs explicit state
transfer such as URI rewriting, hidden form fields or cookies. To point the future
state of communication, the current state can be embedded in a response message.
Mostly the stateless RESTful web services are scalable in nature as they can serve
very large number of clients with supporting caching mechanisms, clustering, and
load balancing.
The common example of REST web service is Amazon AWS which uses various
REST methods in its Simple Storage Service (S3). The Simple Storage Service uses
bucket as a medium for storing the objects also called items. For manipulating the
bucket, it makes HTTP requests to create, fetch, and delete buckets using PUT, GET,
POST and DELETE methods.
The RESTful web services are mainly used in web 2.0 applications where the
mashup allows to combine the capabilities of one web application into another, for
example, taking the videos from online YouTube repository and put into a
Facebook page.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-7 Cloud Enabling Technologies
medium for associating remote clients with applications for quite a long time, and more
recently, coordinating applications over the Internet has gained in popularity. The term
"web service" is frequently alluded to an independent, self-describing, modular
application intended to be utilized and accessible by other software applications over the
web. In general, Web services are loosely coupled (platform independent), contracted
components (behavior, input and output parameters, binding specifications are public)
that communicate in XML-based (open standard) interfaces. When a web service is
deployed, different applications and other web services can find and invoke the deployed
service. The functionality of web services is shown in Fig. 2.3.1.
In web services, service provider is responsible for developing and publishing the
various services into UDDI (Universal Description Discovery and Integration) registry
which can be accessed by different Service Consumers. When any consumer wants to
invoke a service, they have to make a query for finding the reference of service into UDDI
registry. If reference of service is available which is registered by service provider, then
service is bind to the consumer who has invoked it. During this phase consumer can get
access to WSDL (Web Service Description Language) document which has description
about the services published by provider. After binding the service, consumer can send
call to the method with parameters using SOAP request message and provider sends
result using SOAP response message.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-8 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2-9 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 10 Cloud Enabling Technologies
business process and its services. The three basic functions of UDDI are Publish service
which shows how to register a web service, Find service which shows how a client finds a
web service and Bind service which shows how the client connects and interacts with a
web service. A UDDI registry is made up of XML-based service descriptors. Each service
descriptor contains the information needed to find and then bind to a particular web
service.
The SOAP is an extension, and an evolved version of XML-RPC. It uses remote
procedure call protocol with XML for encoding its calls and HTTP as a transport
mechanism. In XML-RPC, a call to the procedure is made by client and executed on the
server. The resultant value returned by sever is formatted in XML.
As XML-RPC was not completely lined up with the most recent XML standardization
hence, it didn't permit developers to expand the request or response format of an XML-
RPC call. The SOAP primarily portrays the conventions between associating, gatherings
and leaves the data format of exchanging messages to XML schema. The significant
contrast between web service and other technologies like CORBA, J2EE, and CGI
scripting is its standardization, since it depends on standardize XML and giving a
language independent representation of data. Most web services transmit messages over
HTTP, making them accessible as internet-scale applications. In RESTful web services, the
interaction can be either synchronous or asynchronous for making them reasonable for
both request and response with single direction exchange patterns.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 11 Cloud Enabling Technologies
Each layer in a WS protocol stack provides a set of standards and protocols for
successful working of Web services. The bottommost and first layer in protocol stack is
Transport Layer which is responsible for transporting a message between applications. It
supports different protocols based on the type of application like HTTP, Simple Mail
Transfer Protocol (SMTP), Java Messaging Services (JMS), Internet Interoperable Protocol
(IIOP) in CORBA etc.
The second layer in protocol stack is Messaging layer which is required for encoding
in transit messages in XML or other formats that are understood by both client and
server. This layer provides various protocols like SOAP, WS-Coordination, WS-
Transaction and WS-addressing for web services. The SOAP uses XML based request and
response messages to communicate between two parties. WS-Coordination provides
protocols that can coordinate the actions of distributed applications. It facilitates
transaction processing, workflow management, and other systems for coordination to
hide their proprietary protocols and to operate in a heterogeneous environment. WS-
Transaction specification describes the coordination types that are used with the
extensible coordination framework and perform transactions. WS-Transaction work on
WS- Coordination protocol whose communication patterns are asynchronous by default.
It defines two coordination types : Atomic Transaction (AT) for individual operations and
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 12 Cloud Enabling Technologies
Business Activity (BA) for long running transactions. WS-addressing provides transport-
neutral mechanisms to address Web services and messages. It provides specification of
transport-neutral mechanism that allows web services to communicate addressing
information. It also gives interoperable constructs that convey information provided by
transport protocols and messaging systems.
The third layer in WS protocol stack is a Service Description layer which is used for
describing the public interface to a specific web service. It composed of four specifications
like WSDL, WS-Resource_Properties, WS-Policy and WS-Service_ Group.
The WSDL which describes the services used by provider and used by recipient. The
WS-Resource_Properties provide a set of properties associated with web resources. It also
describes an interface to associate a set of typed values with a WS-Resource. The WS-
Policy allows web services to use XML to advertise their policies used by consumers. It
also represents a set of specifications that describe the capabilities and constraints of the
security policies on intermediaries and end points and WS-Service_Group describes an
interface for operating on collections of WS-Resources.
The fourth layer is Service Discovery layer that uses UDDI registry to register or
publish a web service written by provider and discover by consumer for the invocation. It
centralizes web services into a common registry so that web service provider can publish
their services with location and description, and makes it easy for consumer to discover
them that are available on the network.
The fifth layer in protocol stack is QoS (Quality of Service) layer. It has three
specifications namely WS-Reliable_Messaging, WS-Security and WS-Resource_Validity.
The WS-Reliable_Messaging describes a protocol that allows SOAP messages to be
reliably delivered between distributed applications. The WS-Security provides a
specification that defines how security measures are implemented in web services to
protect them from external attacks and WS-Resource_Lifetime describes an interface to
manage the lifetime of a WS-Resource.
The sixth layer of protocol stack is a Composition layer which is used for composition
of business processes. It has two components namely BPEL4WS (Business Process
Execution Language for Web Service) and WS-Notification. The Business Process
Execution Language (BPEL) is a specification for describing business processes in a
portable XML format. BPEL4WS is a standard executable language for specifying
interactions between web services recommended by OASIS, where web services can be
composed together to make more complex web services and workflows. The goal of
BPEL4WS is to complete the business transaction, or fulfillment of the job of a service. The
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 13 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 15 Cloud Enabling Technologies
using Oracle Net Services. It provides features like rule-based subscription, message
broadcast, message listen, message notification, and high availability (HA),
scalability, and reliability to the application, queuing system and database.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 16 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 17 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 18 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 19 Cloud Enabling Technologies
The load balancing is done by distributing the workload of heavy loaded machine in
to other lightly loaded machines. By default, virtualization supports dynamic load
balancing which is shown in Fig. 2.5.4.
5) Server Consolidation
The server consolidation in virtualization means aggregating the multiple servers and
their applications into a single machine which were required to have many physical
computers with unique operating system. It allows multiple servers to be consolidated
into a single server which makes optimum resource utilization of a server. It is capable to
run legacy software applications with old OS configuration and the new applications
running latest OS together inside VMs. The concept of server consolidation is shown in
Fig. 2.5.5.
6) Disaster recovery
Disaster recovery is a critical component for IT organizations. It is required when
system crashes due to the natural disasters like flood, earthquake etc. As sometime
mission critical or business critical applications run inside the virtual machines, it can
create huge business/economic losses. Therefore, to take care of that virtualization
technology provides built-in disaster recovery feature that enables control on a virtual
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 20 Cloud Enabling Technologies
7) Easy VM management
The VMs running on one machine can be easily managed by copying, migrating,
templating or snapshotting on to another machine for backup. They can be easily
migrated in case of maintenance or can be deleted if they are not in use.
9) Sandboxing
Virtual machines are useful to provide secure, isolated environments (sandboxes) for
running foreign or less-trusted applications. Virtualization technology can, thus, help
build secure computing platforms.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 21 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 22 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 24 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 25 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 26 Cloud Enabling Technologies
switches and routers, are pooled and accessible by any user via a centralized management
system. The benefits of network virtualization are
It consolidates the physical hardware of a network into a single virtual network that
reduce the management overhead of network resources.
It gives better scalability and flexibility in network operations.
It provides automated provisioning and management of network resources.
It reduces the hardware requirements and will have a corresponding impact on
your power consumption.
It is cost effective as it requires reduced the number of physical devices.
There are five implementation levels of virtualization, that are Instruction Set
Architecture (ISA) level, Hardware level, Operating System level, Library support level
and Application level which are explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 27 Cloud Enabling Technologies
a) Boochs
It is a highly portable emulator that can be run on most popular platforms that include
x86, PowerPC, Alpha, Sun, and MIPS. It can be compiled to emulate most of the versions
of x86 machines including 386, 486, Pentium, Pentium Pro or AMD64 CPU, including
optional MMX, SSE, SSE2, and 3DNow instructions.
b) QEMU
QEMU (Quick Emulator) is a fast processor emulator that uses a portable dynamic
translator. It supports two operating modes: user space only, and full system emulation.
In the earlier mode, QEMU can launch Linux processes compiled for one CPU on another
CPU, or for cross-compilation and cross-debugging. In the later mode, it can emulate a
full system that includes a processor and several peripheral devices. It supports
emulation of a number of processor architectures that includes x86, ARM, PowerPC, and
Sparc.
c) Crusoe
The Crusoe processor comes with a dynamic x86 emulator, called code morphing
engine that can execute any x 86 based application on top of it. The Crusoe is designed to
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 28 Cloud Enabling Technologies
handle the x86 ISA’s precise exception semantics without constraining speculative
scheduling. This is accomplished by shadowing all registers holding the x86 state.
d) BIRD
BIRD is an interpretation engine for x86 binaries that currently supports only x86 as
the host ISA and aims to extend for other architectures as well. It exploits the similarity
between the architectures and tries to execute as many instructions as possible on the
native hardware. All other instructions are supported through software emulation.
a) VMware
The VMware products are targeted towards x86-based workstations and servers. Thus,
it has to deal with the complications that arise as x86 is not a fully-virtualizable
architecture. The VMware deals with this problem by using a patent-pending technology
that dynamically rewrites portions of the hosted machine code to insert traps wherever
VMM intervention is required. Although it solves the problem, it adds some overhead
due to the translation and execution costs. VMware tries to reduce the cost by caching the
results and reusing them wherever possible. Nevertheless, it again adds some caching
cost that is hard to avoid.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 29 Cloud Enabling Technologies
b) Virtual PC
The Microsoft Virtual PC is based on the Virtual Machine Monitor (VMM) architecture
that lets user to create and configure one or more virtual machines. It provides most of
the functions same as VMware but additional functions include undo disk operation that
lets the user easily undo some previous operations on the hard disks of a VM. This
enables easy data recovery and might come handy in several circumstances.
c) Denali
The Denali project was developed at University of Washington’s to address this issue
related to scalability of VMs. They come up with a new virtualization architecture also
called Para virtualization to support thousands of simultaneous machines, which they call
Lightweight Virtual Machines. It tries to increase the scalability and performance of the
Virtual Machines without too much of implementation complexity.
a) Jail
The Jail is a FreeBSD based virtualization software that provides the ability to partition
an operating system environment, while maintaining the simplicity of UNIX ”root”
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 30 Cloud Enabling Technologies
model. The environments captured within a jail are typical system resources and data
structures such as processes, file system, network resources, etc. A process in a partition is
referred to as “in jail” process. When the system is booted up after a fresh install, no
processes will be in jail. When a process is placed in a jail, all of its descendants after the
jail creation, along with itself, remain within the jail. A process may not belong to more
than one jail. Jails are created by a privileged process when it invokes a special system call
jail. Every call to jail creates a new jail; the only way for a new process to enter the jail is
by inheriting access to the jail from another process that already in that jail.
b) Ensim
The Ensim virtualizes a server’s native operating system so that it can be partitioned
into isolated computing environments called virtual private servers. These virtual private
servers operate independently of each other, just like a dedicated server. It is commonly
used in creating hosting environment to allocate hardware resources among large
number of distributed users.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 31 Cloud Enabling Technologies
Such VMs pose little security threat to the system while letting the user to play with it like
physical machines. Like physical machine it has to provide an operating environment to
its applications either by hosting a commercial operating system, or by coming up with its
own environment.
The comparison between different levels of virtualization is shown in Table 2.7.1.
Implementation Level Performance Application Implementation Application
Flexibility Complexity Isolation
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 32 Cloud Enabling Technologies
responsible for providing hardware drivers to guest OS instead of the VMM. In this type,
hypervisor has to rely on host OS for pass through permissions to access hardware. In
many cases, hosted hypervisor needs emulator, which lies between guest OS and VMM to
translate the instructions in native format. The hosted structure is shown in Fig. 2.8.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 33 Cloud Enabling Technologies
The popular hosted hypervisors are QEMU, VMware Workstation, Microsoft Virtual
PC, Oracle VirtualBox etc.
The advantages of hosted structure are
It is easy to install and manage without disturbing host systems hardware.
It supports legacy operating systems and applications.
It provides ease of use with greater hardware compatibility.
It does not require to install any drivers for IO devices as they are installed through
built-in driver stack.
It can be used for testing beta software.
The hosted hypervisors are usually free software and can be run on user
workstations.
The disadvantages of hosted structure are
It does not allow guest OS to directly access the hardware instead it has to go
through base OS, which increases resource overhead.
It has very slow and degraded virtual machines performance due to relying on
intermediate host OS for getting hardware access.
It doesn’t scale up beyond the limit.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 34 Cloud Enabling Technologies
The popular Bare-Metal Hypervisors are Citrix Xen Server, VMware ESXI and
Microsoft Hyper V.
The advantages of Bare-Metal structure are
It is faster in performance and more efficient to use.
It provides enterprise features like high scalability, disaster recovery and high
availability.
It has high processing power due to the resource pooling.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 35 Cloud Enabling Technologies
A) Xen
Xen is an open source bare-metal (Type I) hypervisor developed by Cambridge
University. It runs on the top of hardware without needing a host operating system. The
absence of host OS eliminate the need for pass through permission by the hypervisor. Xen
is a microkernel hypervisor, which separates the policy from the mechanism. It provides a
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 36 Cloud Enabling Technologies
virtual environment located between the hardware and the OS. As Xen hypervisor runs
directly on the hardware devices, it runs many guest operating systems on the top of it.
The various operating system platforms supported as a guest OS by Xen hypervisor are
Windows, Linux, BSD and Solaris.
The architecture of the Xen hypervisor is shown in Fig. 2.9.1.
There are three core-components of the Xen system, namely kernel, hypervisor and
applications. It is important to note that the organization of these three components is
specific. The Xen hypervisor implements all the mechanisms, leaving the policy to be
handled by Domain 0.
The guest OS, which has control ability, is called Domain 0, and the others are called
Domain U. Domain 0 is the privileged guest OS of the Xen system and is responsible for
controlling the functionality of entire system. Other guests are known as Domain U.
Domain 0, which typically acts like a VMM, is the first one to get loaded when Xen starts
without any file system drivers being available. The Domain 0 handles the following
operations :
Allocates or map hardware resources to Domain U domains or for guest domains.
Manages all other VMs.
Creates, copies, saves, reads, modifies, shares, migrates, and roll backs VMs.
Accesses the underlying hardware.
Manages IO and other devices.
Xen gives a virtual domain situated between the equipment and the OS. The Xen
hypervisor does not include any device drivers natively for guest OS. It provides a
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 37 Cloud Enabling Technologies
mechanism by which guests OS can have direct access to the physical devices. That’s
why, size of the Xen hypervisor is kept rather small. Domain 0 is very crucial to the Xen
hypervisor and it needs to be protected. This is because, if the security of Domain 0 OS is
hampered by an intruder/hacker, she/he would gain control of the entire system. As
Domain 0, behaving as a VMM, any compromise in security of it may allow intruders to
create, copy, save, read, modify, share, migrate, and roll back VMs as easily as
manipulating a file.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 38 Cloud Enabling Technologies
In KVM, Quick emulator is required for emulating the native and privileged
instructions issued by Guest OS. In KVM architecture, QEMU process runs as a user
space process on top of the Linux kernel with KVM module, and a guest kernel runs on
the of emulated hardware in QEMU. QEMU can co-work with KVM for hardware-based
virtualization. Using hardware-based virtualization, QEMU does not have to emulate all
CPU instructions, therefore it works really fast.
Some of the important features provide by KVM are
Supports 32 and 64 bit guests OS (on 64 bit hosts)
Supports hardware virtualization features
Provides Para virtualized drivers for guest OS
Provide synchronous Snapshots
Gives Delta images of virtual machines along with PCI passthrough
Kernel same page merging
Support CPU and PCI hot plug feature
It has built-in Qemu Monitor Protocol (QMP) and KVM Paravirtual Clock
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 39 Cloud Enabling Technologies
virtualization layer, there are several classes of VM mechanisms, namely the binary
translation, para-virtualization, full virtualization, hardware assist virtualization and
host-based virtualization. The mechanisms of virtualization defined by VMware and
other virtualization providers are explained as follows.
a) Binary translation
In Binary translation of guest OS, The VMM runs at Ring 0 and the guest OS at Ring 1.
The VMM checks the instruction stream and identifies the privileged, control and
behavior-sensitive instructions. At the point when these instructions are identified, they
are trapped into the VMM, which emulates the behavior of these instructions. The
method used in this emulation is called binary translation. The binary translation
mechanism is shown in Fig. 2.9.3.
b) Full Virtualization
In full virtualization, host OS doesn’t require any modification to its OS code. Instead
it relies on binary translation to virtualize the execution of some sensitive,
non-virtualizable instructions or execute trap. Most of the guest operating systems and
their applications composed of critical and noncritical instructions. These instructions are
executed with the help of binary translation mechanism. With full virtualization,
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 40 Cloud Enabling Technologies
noncritical instructions run on the hardware directly while critical instructions are
discovered and replaced with traps into the VMM to be emulated by software. In a host-
based virtualization, both host OS and guest OS takes part in virtualization where
virtualization software layer lies between them.
Therefore, full virtualization works with binary translation to perform direct execution
of instructions where guest OS is completely decoupled from the underlying hardware
and consequently, it is unaware that it is being virtualized. The full virtualization gives
degraded performance, because it involves binary translation of instructions first rather
than executing which is rather time-consuming. Specifically, the full virtualization of I/O
intensive applications is a really a big challenge as Binary translation employs a code
cache to store translated instructions to improve performance, however it expands the
cost of memory usage.
c) Host-based virtualization
In host-based virtualization, the virtualization layer runs on top of the host OS and
guest OS runs over the virtualization layer. Therefore, host OS is responsible for
managing the hardware and control the instructions executed by guest OS. The host-
based virtualization doesn’t require to modify the code in host OS but virtualization
software has to rely on the host OS to provide device drivers and other low-level services.
This architecture simplifies the VM design with ease of deployment but gives degraded
performance compared to other hypervisor architectures because of host OS
interventions. The host OS performs four layers of mapping during any IO request by
guest OS or VMM which downgrades performance significantly.
2.9.2.2 Para-Virtualization
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 41 Cloud Enabling Technologies
The x86 processor uses four instruction execution rings namely Ring 0, 1, 2, and 3. The
ring 0 has higher privilege of instruction being executed while Ring 3 has lower privilege.
The OS is responsible for managing the hardware and the privileged instructions to
execute at Ring 0, while user-level applications run at Ring 3. The KVM hypervisor is the
best example of para-virtualization. The functioning of para-virtualization is shown in
Fig. 2.9.5.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 42 Cloud Enabling Technologies
Among that Ring 0, Ring 1 and Ring 2 are associated with operating system while Ring
3 is reserved for applications to manage access to the computer hardware. As Ring 0 is
used by kernel because of that Ring 0 has the highest-level privilege while Ring 3 has
lowest privilege as it belongs to user level application shown in Fig. 2.10.1.
The user level applications typically run in Ring 3, the operating system needs to have
direct access to the memory and hardware and must execute its privileged instructions in
Ring 0. Therefore, Virtualizingx86 architecture requires placing a virtualization layer
under the operating system to create and manage the virtual machines that delivers
shared resources. Some of the sensitive instructions can’t be virtualized as they have
different semantics. If virtualization is not provided then there is a difficulty in trapping
and translating those sensitive and privileged instructions at runtime which become the
challenge. The x86 privilege level architecture without virtualization is shown in
Fig. 2.10.2.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 44 Cloud Enabling Technologies
which VMM runs in supervisor mode. When the privileged instructions along with
control and behavior-sensitive instructions of a VM are executed, then they get trapped in
the VMM. In such scenarios, the VMM becomes the unified mediator for hardware access
from different VMs and guarantee the correctness and stability of the whole system.
However, not all CPU architectures are virtualizable. There are three techniques can be
used for handling sensitive and privileged instructions to virtualize the CPU on the x86
architecture :
1) Binary translation with full virtualization
2) OS assisted virtualization or para-virtualization
3) Hardware assisted virtualization
The above techniques are explained in detail as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 45 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 46 Cloud Enabling Technologies
Privileged and sensitive calls are set automatically to trap the hypervisor running on
hardware that removes the need for either binary translation or para-virtualization. The
Fig. 2.10.5 shows Hardware Assisted Virtualization.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 47 Cloud Enabling Technologies
The Guest OS is responsible for controlling the mapping of virtual addresses to the
guest memory physical addresses, but the Guest OS cannot have direct access to the
actual machine memory. The VMM is responsible for mapping the Guest physical
memory to the actual machine memory, and it uses shadow page tables to accelerate the
mappings. The VMM uses TLB (Translation Lookaside Buffer) hardware to map the
virtual memory directly to the machine memory to avoid the two levels of translation on
every access. When the guest OS changes the virtual memory to physical memory
mapping, the VMM updates the shadow page tables to enable a direct lookup. The
hardware-assisted memory virtualization by AMD processor provides hardware
assistance to the two-stage address translation in a virtual execution environment by
using a technology called nested paging.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 48 Cloud Enabling Technologies
The virtual devices shown in above Fig. 2.12.1 can be effectively emulate on
well-known hardware and can translate the virtual machine requests to the system
hardware. The standardize device drivers help for virtual machine standardization. The
portability in I/O Virtualization allows all the virtual machines across the platforms to be
configured and run on the same virtual hardware regardless of their actual physical
hardware in the system. There are three ways of implementing I/O virtualization. The
full device emulation approach emulates well-known real-world devices where all the
functions of device such as enumeration, identification, interrupt and DMA are replicated
in software. The para-virtualization method of IO virtualization uses split driver model
that consist of frontend and backend drivers. The front-end driver runs on Domain U
which manages I/O request of guest OS. The backend driver runs Domain 0 which
manages real I/O devices with multiplexing of I/O data of different VMs. They interact
with each other via block of shared memory. The direct I/O virtualization let the VM to
access devices directly.it mainly focus on networking of mainframes. There are four
methods to implement I/O virtualization namely full device emulation, para-
virtualization, and direct I/O virtualization and through self-virtualized I/O.
In full device emulation, the IO devices are virtualized using emulation software. This
method can emulate all well-known and real-world devices. The emulation software is
responsible for performing all the functions of a devices or bus infrastructure, such as
device enumeration, identification, interrupts, and DMA which are replicated. The
software runs inside the VMM and acts as a virtual device. In this method, the I/O access
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 49 Cloud Enabling Technologies
requests of the guest OS are trapped in the VMM which interacts with the I/O devices.
The multiple VMs share a single hardware device for running them concurrently.
However, software emulation consumes more time in IO access that’s why it runs much
slower than the hardware it emulates.
In para-virtualization method of I/O virtualization, the split driver model is used
which consist of frontend driver and backend driver. It is used in Xen hypervisor with
different drivers like Domain 0 and Domain U. The frontend driver runs in Domain U
while backend driver runs in Domain 0. Both the drivers interact with each other via a
block of shared memory. The frontend driver is responsible for managing the I/O
requests of the guest OSes while backend driver is responsible for managing the real I/O
devices and multiplexing the I/O data of different VMs.
The para-virtualization method of I/O virtualization achieves better device
performance than full device emulation but with a higher CPU overhead.
In direct I/O virtualization, the virtual machines can access IO devices directly. It does
not have to rely on any emulator of VMM. It has capability to give better IO performance
without high CPU costs than para-virtualization method. It was designed for focusing on
networking for mainframes.
In self-virtualized I/O method, the rich resources of a multicore processor and
harnessed together. The self-virtualized I/O encapsulates all the tasks related with
virtualizing an I/O device. The virtual devices with associated access API to VMs and a
management API to the VMM are provided by self-virtualized I/O that defines one
Virtual Interface (VIF) for every kind of virtualized I/O device.
The virtualized I/O interfaces are virtual network interfaces, virtual block devices
(disk), virtual camera devices, and others. The guest OS interacts with the virtual
interfaces via device drivers. Each VIF carries a unique ID for identifying it in self-
virtualized I/O and consists of two message queues. One message queue for outgoing
messages to the devices and another is for incoming messages from the devices.
As there are a many of challenges associated with commodity hardware devices, the
multiple IO virtualization techniques need to be incorporated for eliminating those
associated challenges like system crash during reassignment of IO devices, incorrect
functioning of IO devices and high overhead of device emulation.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 50 Cloud Enabling Technologies
not have to bother about physical servers through which the services are provisioned and
application developers do not worry about network issues or infrastructure problems
such as scalability, latency and fault tolerance.
Virtualization software is used in most cloud computing systems to virtualize the
hardware.
It simulates hardware execution, and even runs unmodified operating systems. Some
of the prominent advantages of virtualization for cloud computing are
Supports legacy software applications and old operating systems.
Provides a readily available development and deployment environment for
developers to build a cloud application with wide variety of tools and platforms
Provision virtual machines on demand along with unmatched scalability.
Provides flexibility for users and developers to use the platform.
Provides high throughput, high availability and effective load balancing.
Provides Disaster recovery along with centralized resource and data management.
And so on. The functional representation of virtualization in cloud computing is
shown in Fig. 2.13.1. (See Fig. 2.13.1 on next page).
Some of the applications of virtualization are given as follows.
a) Virtualization for Public cloud platform
Today, every public cloud service provider uses Virtualization to save their physical
resources, energy and manpower along with making cloud services easier for access,
effective and reliable. The cloud service providers like AWS, Google or Microsoft
provides freedom for their customers to develop and deploy applications on their cloud
platform seamlessly. Because of that, today everyone is interested in using the public
cloud services which are deployed under the virtualization solution.
b) Virtualization for Green Data Centers
As we know that, because of huge power consumption by physical servers and other
equipment’s in data center, IT power consumption reached to the remarkable figure.
Because of that, many countries are facing energy crisis to a great extent.
Therefore, virtualization can be used to make low power consumption and effectively
cost reduction in IT data centers. It makes a great impact on cost reduction and power
consumption due to consolidation of many physical servers in to fewer.
Therefore, concept of Green Data Centers comes into picture where storage and other
virtualization mechanisms can be used to minimize the use of power, energy, cost as well
as physical servers.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 51 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 52 Cloud Enabling Technologies
with APIs, improve security of applications by building sandbox environment over VMs
and provide better QoS to applications and performance isolation over the virtualized
cloud platform.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 53 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 54 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 55 Cloud Enabling Technologies
Any program run under a VMM should exhibit a function identical to that which it
runs on the original physical machine directly.
VMM must be tightly related to the architectures of processors
Move and copy virtual machines data as easily as like moving and copying files.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 56 Cloud Enabling Technologies
Service reusability
Platform integration
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 57 Cloud Enabling Technologies
deployed service. The term "web service" is frequently alluded to an independent, self-
describing, modular application intended to be utilized and accessible by other
software applications over the web.
Q.10 What are different characteristics of SOA ?
Ans. : The different characteristics of SOA are as follows :
Provides interoperability between the services.
Instruction Set Architecture Level Very Poor Very Good Medium Medium
(ISA)
Hardware Abstraction Level Very Good Medium Very Good Good
(HAL)
Operating System Level Very Good Poor Medium Poor
It provides enterprise features like high scalability, disaster recovery and high
availability.
It has high processing power due to the resource pooling.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 59 Cloud Enabling Technologies
It requires specialized servers to install and run hypervisor and do not run on user
workstations.
In some cases, it becomes complex for management.
Q.1 “Virtualization is the wave of the future”. Justify. Explicate the process of CPU,
Memory and I/O device virtualization in data center. AU : May -18
Q.3 What is virtualization ? Describe para and full virtualization architectures, compare
and contrast them. AU : Dec.-17
Ans. : Refer sections 2.9.2.2 and 2.10
Better performance than full virtualization Lower performance than para virtualization
Q.4 Illustrate the architecture of virtual machine and brief about the operations.
AU : Dec.-16
Ans. : Refer section 2.8, structures of virtualization (Hosted and Bare-Metal).
Q.5 Write short note on Service Oriented Architecture. AU : Dec.-16
Ans. : Refer section 2.1.
Q.6 Discuss how virtualization is implemented in different layers. AU : May-17
Ans. : Refer section 2.7, Implementation Levels of Virtualization.
Q.7 Analyse how the virtualization technology supports the cloud computing. AU : May-19
Ans. : Refer section 2.13.
Q.8 Write a detailed note on web services.
Ans. : Refer section 2.3.
Q.9 Explain in detail web services protocol stack and publish-subscribe models with
respect to web services.
Ans. : Refer section 2.3.1 and 2.4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 2 - 61 Cloud Enabling Technologies
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
3
Cloud Architecture, Services
and Storage
Syllabus
Layered cloud architecture design - NIST cloud computing reference architecture - Public, Private
and Hybrid clouds - laaS - PaaS - SaaS - Architectural design challenges - Cloud storage -
Storage-as-a-Service - Advantages of cloud storage - Cloud storage providers - S3.
Contents
3.1 Cloud Architecture Design
3.2 NIST Cloud Computing Reference Architecture
3.3 Cloud Deployment Models
3.4 Cloud Service Models
3.5 Architectural Design Challenges
3.6 Cloud Storage
3.7 Storage as a Service
3.8 Advantages of Cloud Storage
3.9 Cloud Storage Providers
3.10 Simple Storage Service (S3)
(3 - 1)
Cloud Computing 3-2 Cloud Architecture, Services and Storage
built into the data centers which are typically owned and operated by a third - party
provider. The next section explains the layered architecture design for cloud platform.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-4 Cloud Architecture, Services and Storage
virtual machines or virtual servers along with virtual storages. The abstraction of these
hardware resources is intended to provide the flexibility to the users. Internally,
virtualization performs automated resource provisioning and optimizes the process of
managing resources. The infrastructure layer act as a foundation for building the second
layer called platform layer for supporting PaaS services.
The platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. This layer provides an
environment for users to create their applications, test operation flows, track the
performance and monitor execution results. The platform must be ensuring to provide
scalability, reliability and security. In this layer, virtualized cloud platform, acts as an
"application middleware" between the cloud infrastructure and application layer of cloud.
The platform layer is the foundation for application layer.
A collection of all software modules required for SaaS applications forms the
application layer. This layer is mainly responsible for making on demand application
delivery. In this layer, software applications include day-to-day office management
softwares used for information collection, document processing, calendar and
authentication. Enterprises also use the application layer extensively in business
marketing, sales, Customer Relationship Management (CRM), financial transactions and
Supply Chain Management (SCM). It is important to remember that not all cloud services
are limited to a single layer. Many applications can require mixed - layers resources. After
all, with a relation of dependency, the three layers are constructed from the bottom up
approach. From the perspective of the user, the services at various levels need specific
amounts of vendor support and resource management for functionality. In general, SaaS
needs the provider to do much more work, PaaS is in the middle and IaaS requests the
least. The best example of application layer is the Salesforce.com's CRM service where not
only the hardware at the bottom layer and the software at the top layer is supplied by the
vendor, but also the platform and software tools for user application development and
monitoring.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-5 Cloud Architecture, Services and Storage
The NIST team works closely with leading IT vendors, developers of standards,
industries and other governmental agencies and industries at a global level to support
effective cloud computing security standards and their further development. It is
important to note that this NIST cloud reference architecture does not belong to any
specific vendor products, services or some reference implementation, nor does it prevent
further innovation in cloud technology.
The NIST reference architecture is shown in Fig. 3.2.1.
Fig. 3.2.1 : Conceptual cloud reference model showing different actors and entities
From Fig. 3.2.1, note that the cloud reference architecture includes five major actors :
Cloud consumer
Cloud provider
Cloud auditor
Cloud broker
Cloud carrier
Each actor is an organization or entity plays an important role in a transaction or a
process, or performs some important task in cloud computing. The interactions between
these actors are illustrated in Fig. 3.2.2.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-6 Cloud Architecture, Services and Storage
Now, understand that a cloud consumer can request cloud services directly from a
CSP or from a cloud broker. The cloud auditor independently audits and then contacts
other actors to gather information. We will now discuss the role of each actor in detail.
Example 1 : Cloud consumer requests the service from the broker instead of directly
contacting the CSP. The cloud broker can then create a new service by combining
multiple services or by enhancing an existing service. Here, the actual cloud provider is
not visible to the cloud consumer. The consumer only interacts with the broker. This is
illustrated in Fig. 3.2.3.
Example 2 : In this scenario, the cloud carrier provides for connectivity and transports
cloud services to consumers. This is illustrated in Fig. 3.2.4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-7 Cloud Architecture, Services and Storage
In Fig. 3.2.4, the cloud provider participates by arranging two SLAs. One SLA is with
the cloud provider (SLA2) and the second SLA is with the consumer (SLA1). Here, the
cloud provider will have an arrangement (SLA) with the cloud carrier to have secured,
encrypted connections. This ensures that the services are available for the consumer at a
consistent level to fulfil service requests. Here, the provider can specify the requirements,
such as flexibility, capability and functionalities in SLA2 to fulfil essential service
requirements in SLA1.
Example 3 : In this usage scenario, the cloud auditor conducts independent evaluations
for a cloud service. The evaluations will relate to operations and security of cloud service
implementation. Here the cloud auditor interacts with both the cloud provider and
consumer, as shown in Fig. 3.2.5.
In all the given scenarios, the cloud consumer plays the most important role. Based on
the service request, the activities of other players and usage scenarios can differ for other
cloud consumers. Fig. 3.2.6 shows an example of available cloud services types.
In Fig. 3.2.6, note that SaaS applications are available over a network to all consumers.
These consumers may be organisations with access to software applications, end users,
app developers or administrators. Billing is based on the number of end users, the time of
use, network bandwidth consumed and for the amount or volume of data stored.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-8 Cloud Architecture, Services and Storage
PaaS consumers can utilize tools, execution resources, development IDEs made
available by cloud providers. Using these resources, they can test, develop, manage,
deploy and configure many applications that are hosted on a cloud. PaaS consumers are
billed based on processing, database, storage, network resources consumed and for the
duration of the platform used.
On the other hand, IaaS consumers can access virtual computers, network - attached
storage, network components, processor resources and other computing resources that
are deployed and run arbitrary software. IaaS consumers are billed based on the amount
and duration of hardware resources consumed, number of IP addresses, volume of data
stored, network bandwidth, and CPU hours used for a certain duration.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-9 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 10 Cloud Architecture, Services and Storage
Service Arbitrage : This is similar to aggregation, except for the fact that services
that are aggregated are not fixed. In service arbitrage, the broker has the liberty to
choose services from different agencies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 12 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 13 Cloud Architecture, Services and Storage
7. It is cheaper than in house cloud implementation because user have to pay for that
they have used.
8. The resources are easily scalable.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 14 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 15 Cloud Architecture, Services and Storage
Intranet and
7 Network Internet Intranet Internet
Internet
For general
For general Organizations public and For community
8 Availability
public internal staff organizations members
internal Staff
Openstack,
Windows Combination of
VMware cloud, Salesforce
9 Example Azure, AWS Openstack and
CloudStack, community
etc. AWS
Eucalyptus etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 16 Cloud Architecture, Services and Storage
From Fig. 3.4.1, we can see that the Infrastructure as a Service (IaaS) is the bottommost
layer in the model and Software as a Service (SaaS) lies at the top. The IaaS has lower
level of abstraction and visibility, while SaaS has highest level of visibility.
The Fig. 3.4.2 represents the cloud stack organization from physical infrastructure to
applications. In this layered architecture, the abstraction levels are seen where higher
layer services include the services of the underlying layer.
As you can see in Fig. 3.4.2, the three services, IaaS, PaaS and SaaS, can exist
independent of one another or may combine with one another at some layers. Different
layers in every cloud computing model are either managed by the user or by the vendor
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 17 Cloud Architecture, Services and Storage
(provider). In case of the traditional IT model, all the layers or levels are managed by the
user because he or she is solely responsible for managing and hosting the applications. In
case of IaaS, the top five layers are managed by the user, while the four lower layers
(virtualisation, server hardware, storage and networking) are managed by vendors or
providers. So, here, the user will be accountable for managing the operating system via
applications and managing databases and security of applications. In case of PaaS, the
user needs to manage only the application and all the other layers of the cloud computing
stack are managed by the vendor. Lastly, SaaS abstracts the user from all the layers as all of
them are managed by the vendor and the former is responsible only for using the
application.
The core middleware manages the physical resources and the VMs are deployed on
top of them. This deployment will provide the features of pay-per-use services and multi-
tenancy. Infrastructure services support cloud development environments and provide
capabilities for application development and implementation. It provides different
libraries, models for programming, APIs, editors and so on to support application
development. When this deployment is ready for the cloud, they can be used by end-
users/ organisations. With this idea, let us further explore the different service models.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 18 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 19 Cloud Architecture, Services and Storage
In IaaS, the customer has controls over the OS, storage and installed applications, but
has limited control over network components. The user cannot control the underlying
cloud infrastructure. Services offered by IaaS include web servers, server hosting,
computer hardware, OS, virtual instances, load balancing, web servers and bandwidth
provisioning. These services are useful during volatile demands and when there is a
computing resource need for a new business launch or when the company may not want
to buy hardware or if the organisation wants to expand.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 20 Cloud Architecture, Services and Storage
In this model, users interact with the software and append and retrieve data, perform
an action, obtain results from a process task and perform other actions allowed by the
PaaS vendor. In this service model, the customer does not own any responsibility to
maintain the hardware and software and the development environment. The applications
created are the only interactions between the customer and the PaaS platform. The PaaS
cloud provider owns responsibility for all the operational aspects, such as maintenance,
updates, management of resources and product lifecycle. A PaaS customer can control
services such as device integration, session management, content management, sandbox,
and so on. In addition to these services, customer controls are also possible in Universal
Description Discovery and Integration (UDDI), and platform independent Extensible
Mark-up Language (XML) registry that allows registration and identification of web
service apps.
Let us consider an example of Google app engine.
The platform allows developers to program apps using Google’s published APIs. In
this platform, Google defines the tools to be used within the development framework, the
file system structure and data stores. A similar PaaS offering is given by Force.com,
another vendor that is based on the Salesforce.com development platform for the latter’s
SaaS offerings.Force.com provides an add - on development environment.
In PaaS, note that developers can build an app with Python and Google API. Here, the
PaaS vendor is the developer who offers a complete solution to the user. For instance,
Google acts as a PaaS vendor and offers web service apps to users. Other examples are :
Google Earth, Google Maps, Gmail, etc.
PaaS has a few disadvantages. It locks the developer and the PaaS platform in a
solution specific to a platform vendor. For example, an application developed in Python
using Google API on Google App Engine might work only in that environment.
PaaS is also useful in the following situations :
When the application must be portable.
When proprietary programming languages are used.
When there is a need for custom hardware and software.
Major PaaS applications include software development projects where developers and
users collaborate to develop applications and automate testing services.
3.4.2.1 Power of PaaS
PaaS offers promising services and continues to offer a growing list of benefits. The
following are some standard features that come with a PaaS solution :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 21 Cloud Architecture, Services and Storage
Source code development : PaaS solutions provide the users with a wide range of
language choices including stalwarts such as Java, Perl, PHP, Python and Ruby.
Websites : PaaS solutions provide environments for creating, running and
debugging complete websites, including user interfaces, databases, privacy and
security tools. In addition, foundational tools are also available to help developers
update and deliver new web applications to meet the fast-changing needs and
requirements of their user communities.
Developer sandboxes : PaaS also provides dedicated “sandbox” areas for
developers to check how snippets of a code perform prior to a more formal test.
Sandboxes help the developers to refine their code quickly and provide an area
where other programmers can view a project, offer additional ideas and suggest
changes or fixes to bugs.
The advantages of PaaS go beyond relieving the overheads of managing servers,
operating systems, and development frameworks. PaaS resources can be provisioned and
scaled quickly, within days or even minutes. This is because the organisation does not
have host any infrastructure on premises. In fact, PaaS also may help organisations reduce
costs with its multitenancy model of cloud computing allowing multiple entities to share
the same IT resources. Interestingly, the costs are predictable because the fees are pre-
negotiating every month.
The following boosting features can empower a developer’s productivity, if efficiently
implemented on a PaaS site :
Fast deployment : For organisations whose developers are geographically scattered,
seamless access and fast deployment are important.
Integrated Development Environment (IDE) : PaaS must provide the developers
with Internet - based development environment based on a variety of languages,
such as Java, Python, Perl, Ruby etc., for scripting, testing and debugging their
applications.
Database : Developers must be provided with access to data and databases. PaaS
must provision services such as accessing, modifying and deleting data.
Identity management : Some mechanism for authentication management must be
provided by PaaS. Each user must have a certain set of permissions with the
administrator having the right to grant or revoke permissions.
Integration : Leading PaaS vendors, such as Amazon, Google App Engine, or
Force.com provide integration with external or web-based databased and services.
This is important to ensure compatibility.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 22 Cloud Architecture, Services and Storage
Logs : PaaS must provide APIs to open and close log files, write and examine log
entries and send alerts for certain events. This is a basic requirement of application
developers irrespective of their projects.
Caching : This feature can greatly boost application performance. PaaS must make
available a tool for developers to send a resource to cache and to flush the cache.
3.4.2.2 Complications with PaaS
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 23 Cloud Architecture, Services and Storage
Co-tenants, who share the same resources, may mutually attack each other’s objects.
Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
Cryptographic methods namely, symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of the
providers to protect the integrity and privacy of user objects on a host.
Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related solutions
are being built to tackle this problem of vendor lock-in. Most customers are unaware of
the terms and conditions of the providers that prevent interoperability and portability of
applications. A number of strategies are proposed on how to avoid/lessen lock-in risks
before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is unable
to migrate its applications or data to a different vendor. This heterogeneity of cloud
semantics creates technical incompatibility, which in turn leads to interoperability and
portability challenges. This makes interoperation, collaboration, portability and
manageability of data and services a very complex task.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 24 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 25 Cloud Architecture, Services and Storage
compatibility is eliminated.
SaaS has the capacity to support multiple users.
In spite of the above benefits, there are some drawbacks of SaaS. For example, SaaS is
not suited for applications that need real - time response where there is a requirement for
data to be hosted externally.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 26 Cloud Architecture, Services and Storage
To protect from cloud attacks, one could encrypt their data before placing it in a cloud.
In many countries, there are laws that allow SaaS providers to keep consumer data and
copyrighted material within national boundaries that also called as compliance or
regulatory standards. Many countries still do not have laws for compliance; therefore, it is
indeed required to check the cloud service providers SLA for executing compliance for
services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 27 Cloud Architecture, Services and Storage
of Service (DDoS) attacks are another obstacle to availability. Criminals are trying to slash
SaaS providers' profits by making their services out of control. Some utility computing
services give SaaS providers the ability to use quick scale - ups to protect themselves
against DDoS attacks.
In some cases, due the failure of a single company who was providing cloud storages
the lock - in concern arises. As well as because of some vendor - lock in solutions of cloud
services providers, organizations face difficulties in migrating to new cloud service
provider. Therefor to mitigate those challenges related to data lock in and vendor lock in,
software stacks can be used to enhance interoperability between various cloud platforms
as well as standardize APIs to rescue data loss due to a single company failure. It also
supports "surge computing" that has the same technological framework in both public
and private clouds and is used to catch additional tasks that cannot be performed
efficiently in a private cloud's data center.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 28 Cloud Architecture, Services and Storage
Intel and AMD technologies and support legacy load balancing hardware to avoid the
challenges related to interoperability.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 29 Cloud Architecture, Services and Storage
receive are not stored on your local hard disks but are kept on the email providers’ server.
It is important to note that none of the data is stored on your local hard drives.
It is true that all computer owners store data. For these users, finding enough storage
space to hold all the data they have accumulated seems like impossible mission. Earlier,
people stored information in the computer’s hard drive or other local storage devices, but
today, this data is saved in a remote database. The Internet provides the connection
between the computer and the database. Fig. 3.6.1 illustrates how cloud storage works.
People may store their data on large hard drives or other external storage devices like
thumb drives or compact discs. But with cloud, the data is stored in a remote database.
Fig. 3.6.1 consists of a client computer, which has a bulk of data to be stored and the
control node, a third-party service provider, which controls several databases together.
Cloud storage system has storage servers. The subscriber copies their files to the storage
servers over the internet, which will then record the data. If the client needs to retrieve the
data, the client accesses the data server with a web - based interface, and the server either
sends the files back to the client or allows the client to access and manipulate the data
itself.
Cloud storage is a service model in which data is maintained, managed and backed up
remotely and made available to users over a network. Cloud storage provides extremely
efficient storage of objects that scales to exabytes of data. It allows to access data from any
storage class instantly, integrate storage with a single unified API into your applications
and optimize the performance with ease. It is the responsibility of cloud storage providers
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 30 Cloud Architecture, Services and Storage
to keep the data available and accessible and to secure and run the physical environment.
Even though data is stored and accessed remotely, you can maintain data both locally and
on the cloud as a measure of safety and redundancy.
The cloud storage system requires one data server to be connected to the internet. The
copies of files are sent by the client to that data server, which saves the information. The
server sends the files back to the client. Through the web - based interface, the server
allows the client to access and change the files on the server itself, whenever he or she
wants to retrieve it. The connection between the computer and database is provided by
the internet. Cloud storage services, however, use tens or hundreds of data servers. Since
servers need maintenance or repair, it is important to store stored data on several
machines, providing redundancy. Without redundancy, cloud storage services could not
guarantee clients that they would be able to access their information at any given time.
There are two techniques used for storing the data on cloud called cloud sync and cloud
backup which are explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 31 Cloud Architecture, Services and Storage
storage to end users, who lacks a budget or a capital budget to pay for it on their own.
End users store their data on rented storage space at remote location on cloud. The
storage as a service providers rent their storage space to the organizations on a cost-per-
gigabyte stored or cost-per-data-transfer basis. The end user doesn't have to pay for the
infrastructure; they only pay for how much they're transferring and saving data on the
servers of the provider.
The storage as a service is a good alternative for small or mid - size businesses that
lacks the capital budget to implement and maintain their own storage infrastructure. The
key providers of storage as a service are Amazon S3, Google Cloud Storage, Rackspace,
Dell EMC, Hewlett Packard Enterprise (HPE), NetApp and IBM etc. It is also being
promoted as a way for all companies to mitigate their risks in disaster recovery, provide
long-term retention of records and enhance both business continuity and availability. The
small - scale enterprises find it very difficult and costly to buy dedicated storage
hardware for data storage and backup. This issue is addressed by storage as a service,
which is a business model that help the small companies in renting storage from large
companies who have wider storage infrastructure. It is also suitable if the technical staff
are not available or have insufficient experience to implement and manage the storage
infrastructure.
Individuals as well as small companies can use storage as a service to save cost and
manage backups. They can save cost in hardware, personnel and physical space. Storage
as a service is also called as hosted storage. Storage Service Provider (SSP) are those
companies which are providing storage as a service. SaaS vendors promotes SaaS as a
suitable way of managing backups in the enterprise. They target the secondary storage
applications. It also helps in mitigating the effect of disaster recovery.
Storage providers are responsible for storing data of their customers using this model.
The storage provider provides the software required for the client to access their stored
data on cloud from anywhere and at any time. Customers use that software to perform
standard storage related activities, including data transfers and backups. Since storage as
a service vendors agree to meet SLAs, businesses can be assured that storage can scale
and perform as required. It can facilitate direct connections to both public and private
cloud storage.
In most instances, organizations use storage as a service that opt public cloud for
storage and backup purpose instead of keeping data on premises. The methods provided
by storage as a service include backup and restore, disasters recovery, block storage, SSD
storage, object storage and transmission of bulk data. The backup and restore refers to
data backup to the cloud which provides protection and recovery when data loss occurs.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 32 Cloud Architecture, Services and Storage
Disaster recovery may refer to protecting and replicating data from Virtual Machines
(VMs) in case of disaster. Block storage allows customers to provision block storage
volumes for lower - latency I/O. SSD storage is another type of storage generally used for
data intensive read/write and I/O operations. Object storage systems are used in in data
analytics, disaster recovery and cloud applications. Cold storage is used for quick creation
and configuration of stored data. Bulk data transfers can use disks and other equipment
for bulk data transmission.
There are many cloud storage providers available on the internet, but some of the
popular storage as a service providers are listed as follows :
Google drive - The google provides Google Drive as a storage service for every
Gmail user who can store up to 15 GB of data free of cost which can scale up to ten
terabytes. It allows to use Google Docs embedded with google account to upload
documents, spreadsheets and presentations to Google’s data servers.
Microsoft one drive - Microsoft provides One drive with 5 GB free storage space
which is scalable to 5 TB for storing users’ files. It is embedded with Microsoft 365
and Outlook mails. It allows to synchronize files between the cloud and a local
folder along with providing a client software for any platform to store and access
files from multiple devices. It allows to backed-up files with ransomware protection
as well as allowing to recover previous saved versions of files or data from the
cloud.
Drop box - Dropbox is a file hosting service, that offers cloud storage, file
synchronization, personal cloud and client software services. It can be installed and
run on any OS platform. It provides free storage space of 2 GB which can scale up to
5 TB.
MediaMax and Strongspace - They offer rented storage space for any kind of
digital data to be stored on cloud servers.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 33 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 34 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 35 Cloud Architecture, Services and Storage
server when your internet connection remains working. When your internet
connection faces technical problems or stops functioning, you will face difficulties
in transmitting the data to or recovering from remote server.
Compliance problems : Many cloud service providers are prone to weaker
compliance as many countries restrict cloud service providers to expose their users
data across country’s geographic boundaries and if they do so, they may get
penalized or may leads to closure of IT operations of specific cloud service provider
in that country that may leads to huge data loss. Therefore, one should never
purchase cloud storage from an unknown source or third parties and always decide
to buy from well - established companies. It might not be possible to operate within
the public cloud depending on the degree of regulation within your industry. This
is particularly the case for healthcare, financial services and publicly traded
enterprises that need to be very cautious when considering this option.
Vulnerability to attacks : The vulnerability to external hack attacks is present with
your business information stored in the cloud. The internet is not entirely secure,
and for this reason, sensitive data can still be stealthy.
Data management : Managing cloud data can be a challenge because cloud storage
systems have their own structures. Your business current storage management
system may not always fit well with the system offered by the cloud provider.
Data protection concerns : Cloud protection and privacy : There are issues about
the remote storage of sensitive and essential data. Before adopting cloud
technologies, you should be aware that you are providing a third - party cloud
service provider with confidential business details and that could potentially harm
your firm. That's why it's crucial to choose a trustworthy service provider you trust
to keep your information protected.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 37 Cloud Architecture, Services and Storage
on a remote data center system. Users can then access these files via an internet
connection. The cloud storage provider also sells non - storage services at a fee.
Enterprises purchase computing, software, storage and related IT components as discreet
cloud services with a pay-as-you-go license. Customers may choose to lease infrastructure
as a service; platform as a service; or security, software and storage as a service. The level
and type of services chosen are set out in a service level agreement signed with the
provider. The ability to streamline costs by using the cloud can be particularly beneficial
for small and medium - sized organizations with limited budgets and IT staff. The main
advantages of using a cloud storage provider are cost control, elasticity and self - service.
Users can scale computing resources on demand as needed and then discard those
resources after the task has been completed. This removes any concerns about exceeding
storage limitations with on - site networked storage. Some of popular cloud storage
providers are Amazon Web Services, Google, Microsoft, Nirvanics and so on. The
description about popular cloud storage providers are given as follows :
Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services
interface that can be used to store and retrieve any amount of data from anywhere
on the cloud at any time. It gives every developer access to the same highly scalable
data storage infrastructure that Amazon uses to operate its own global website
network. The goal of the service is to optimize the benefits of scale and to pass those
benefits on to the developers.
Google Bigtable datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
The size of the Bigtable database can be petabytes, spanning thousands of
distributed servers. Bigtable is now open to developers as part of the Google app
engine, their cloud computing platform.
Microsoft live mesh : Windows live mesh was a free-to-use internet - based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses live framework APIs to share any
data item between devices that recognize the data.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 38 Cloud Architecture, Services and Storage
Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage - based pricing. It supports Cloud - based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup or unstructured archives that need long -
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built - in disaster data recovery and
automatic data replication feature for up to three geographically distributed storage
nodes.
S3 system allows buckets to be named (Fig. 3.10.2), but the name must be unique in the
S3 namespace across all consumers of AWS. The bucket can be accessed through the S3
web API (with SOAP or REST), which is similar to a normal disk storage system.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 39 Cloud Architecture, Services and Storage
The performance of S3 is limited for use with non-operational functions such as data
archiving, retrieval and disk backup. The REST API is more preferred to SOAP API
because it is easy to work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For
example, S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
Create new, modify or delete existing buckets.
Upload or download new objects to a bucket.
Search and identify objects in buckets.
Identify metadata associated with objects and buckets.
Specify where the bucket is stored.
Provide public access to buckets and objects.
The S3 service can be used by many users as a backup component in a 3-2-1 backup
method. This implies that your original data is 1, a copy of your data is 2 and an off-site
copy of data is 3. In this method, S3 is the 3rd level of backup. In addition to this, Amazon
S3 provides the feature of versioning.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 40 Cloud Architecture, Services and Storage
In versioning, every version of the object stored in an S3 bucket is retained, but for this,
the user must enable the versioning feature. Any HTTP or REST operation, namely PUT,
POST, COPY or DELETE will create a new object that is stored along with the older
version. A GET operation retrieves the new version of the object, but the ability to recover
and undo actions are also available. Versioning is a useful method for reserving and
archiving data.
3.10.2 Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 3.10.1 shows
the comparison of amazon glacier and amazon S3 :
Amazon Glacier Amazon S3
It is recognised by archive IDs which are It can use “friendly” key names
system generated
It is extremely low - cost storage Its cost is much higher than Amazon Glacier
You can also use amazon S3 interface for availing the offerings of amazon glacier with
no need of learning a new interface. This can be done by utilising Glacier as S3 storage
class along with object lifecycle policies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 41 Cloud Architecture, Services and Storage
Summary
The cloud architecture design is the important aspect while designing a cloud.
Every cloud platform is intended to provide four essential design goals like
scalability, reliability, efficiency, and virtualization. To achieve this goal, certain
requirements has to be considered.
The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. These three levels of architecture are
implemented with virtualization and standardization of cloud-provided
hardware and software resources.
The NIST cloud computing reference architecture is designed with taking help of
IT vendors, developers of standards, industries and other governmental
agencies, and industries at a global level to support effective cloud computing
security standards and their further development.
A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. There are four
deployment models are characterized based on the functionality and
accessibility of cloud services namely Public, Private, Hybrid and community.
The public cloud services are runs over the internet. Therefore, the users who
want cloud services have to have internet connection in their local device,
private cloud services are used by the organizations internally and most of the
times it run over the intranet connection, Hybrid cloud services are composed of
two or more clouds that offers the benefits of multiple deployment models while
community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause.
The most widespread services of cloud computing are categorised into three
service classes which are also called Cloud service models namely IaaS, PaaS
and SaaS.
Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for
users, Platform as a Service can be defined as a computing platform that allows
the user to create web applications quickly and easily and without worrying
about buying and maintaining the software and infrastructure while Software-
as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users.
There are six challenges related to cloud architectural design related to data
privacy, security, compliance, performance, interoperability, standardization,
service availability, licensing, data storage and bugs.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 42 Cloud Architecture, Services and Storage
Q.1 Bring out differences between private cloud and public cloud. AU : Dec.-16
Ans. : The differences between private cloud and public cloud are given in Table 3.1.
Openstack, VMware
9 Example Windows Azure, AWS etc. Cloud, CloudStack,
Eucalyptus etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 43 Cloud Architecture, Services and Storage
Ans. : The hybrid cloud services are composed of two or more clouds that offers the
benefits of multiple deployment models. It mostly comprises on premise private cloud
and off - premise public cloud to leverage benefits of both and allow users inside and
outside to have access to it. The hybrid cloud provides flexibility such that users can
migrate their applications and services from private cloud to public cloud and vice
versa. It becomes most favored in IT industry because of its eminent features like
mobility, customized security, high throughput, scalability, disaster recovery, easy
backup and replication across clouds, high availability and cost efficient etc. The other
benefits of hybrid cloud are
Easily - accessibility between private cloud and public cloud with plan for disaster
recovery.
We can take a decision about what needs to be shared on public network and what
needs to be kept private.
Get unmatched scalability as per demand.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 44 Cloud Architecture, Services and Storage
Operational cost is lower than IaaS. Operational cost is very minimal than IaaS
and PaaS.
It has lower portability than IaaS. It doesn’t provide portability.
Examples : AWS Elastic Beanstalk, Windows Examples : Google Apps, Dropbox,
Azure, Heroku, Force.com, Google App Salesforce, Cisco WebEx, Concur,
Engine, Apache Stratos, OpenShift GoToMeeting
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 45 Cloud Architecture, Services and Storage
Q.6 What are the basic requirements for cloud architecture design ?
Ans. : The basic requirements for cloud architecture design are given as follows :
The cloud architecture design must provide automated delivery of cloud services
along with automated management.
It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
It must support very large - scale HPC infrastructure with both physical and virtual
machines.
The architecture of cloud must be loosely coupled.
It should provide easy access to cloud services through a self-service web portal.
Cloud management software must be efficient to receive the user request, finds the
correct resources, and then calls the provisioning services which invoke the resources
in the cloud.
It must provide enhanced security for shared access to the resources from data
centers.
It must use cluster architecture for getting the system scalability.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 46 Cloud Architecture, Services and Storage
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. A collection of all
software modules required for SaaS applications forms the application layer. This layer
is mainly responsible for making on demand application delivery. In this layer,
software applications include day-to-day office management softwares used for
information collection, document processing, calendar and authentication. Enterprises
also use the application layer extensively in business marketing, sales, Customer
Relationship Management (CRM), financial transactions, and Supply Chain
Management (SCM).
Q.8 What are different roles of cloud providers ?
Ans. : Cloud provider is an entity that offers cloud services to interested parties. A
cloud provider manages the infrastructure needed for providing cloud services. The
CSP also runs the software to provide services, and organizes the service delivery to
cloud consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfil cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed
on the infrastructure. On the other hand, SaaS consumers have no or limited
administrative controls.
The major activities of a cloud provider include :
Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure, and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
Security : Security, which is a critical function in cloud computing, spans all layers in
the reference architecture. Security must be enforced end-to-end. It has a wide range
from physical to application security. CSPs must take care of security.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 47 Cloud Architecture, Services and Storage
Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication, and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.
Q.9 What are different complications in PaaS ?
Ans. : The following are some of the complications or issues of using PaaS :
Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may
not be compatible with another vendor, and hence not necessarily migrate easily to
it.
Although most of the times customers agree with being hooked up to a single
vendor, this may not be the situation every time. Users may want to keep their
options open. In this situation, developers can opt for open-source solutions. Open-
source PaaS provides elasticity by revealing the underlying code, and the ability to
install the PaaS solution on any infrastructure. The disadvantage of using an open
source version of PaaS is that certain benefits of an integrated platform are lost.
Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to
make sure that the vendor you choose supports the same technologies. For example,
if you are strongly dedicated to a .NET architecture, then you must select a vendor
with native .NET support. Likewise, database support is critical to performance and
minimising complexity.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 48 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 50 Cloud Architecture, Services and Storage
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 51 Cloud Architecture, Services and Storage
The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
Q.13 What is Amazon S3 ?
Ans. : Amazon S3 is a cloud-based storage system that allows storage of data objects in
the range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Amazon S3
offers a simple web services interface that can be used to store and retrieve any amount
of data from anywhere, at any time on the web. It gives any developer access to the
same scalable, secure, fast, low-cost data storage infrastructure that Amazon uses to
operate its own global website network.
Q.1 With architecture, elaborate the various deployment models and reference
models of cloud computing. AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models.
Q.2 Describe service and deployment models of cloud computing environment
with illustration. How do they fit in NIST cloud architecture ? AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models and section 3.2 for NIST cloud architecture.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 52 Cloud Architecture, Services and Storage
Q.3 List the cloud deployment models and give a detailed note about them.
AU : Dec.-16
Ans. : Refer section 3.3 for cloud deployment models.
Q.4 Give the importance of cloud computing and elaborate the different types of
services offered by it. AU : Dec.-16
Ans. : Refer section 3.4 for cloud service models.
Q.5 What are pros and cons for public, private and hybrid cloud ? AU : Dec.-18
Ans. : Refer section 3.3 for pros and cons of public, private and hybrid cloud and
section 3.3.5 for their comparison.
Q.6 Describe Infrastructure as a Service (IaaS), Platform-as-a-Service (PaaS) and
Software-as-a-Service (SaaS) with example. AU : Dec.-18
Ans. : Refer section 3.4 for cloud service models for description of Infrastructure as a
Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
Q.7 Illustrate the cloud delivery models in detail. AU : Dec.-19
Ans. : Refer section 3.4 for cloud delivery models.
Q.13 Explain in detail cloud storage along with its pros and cons.
Ans. : Refer section 3.6 for cloud storage and 3.8 for pros and cons of cloud storage.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
4 Resource Management and
Security in Cloud
Syllabus
Inter Cloud Resource Management - Resource Provisioning and Resource Provisioning
Methods - Global Exchange of Cloud Resources - Security Overview - Cloud Security
Challenges - Software-as-a-Service Security - Security Governance - Virtual Machine Security -
IAM - Security Standards.
Contents
4.1 Inter Cloud Resource Management
4.2 Resource Provisioning and Resource Provisioning Methods
4.3 Global Exchange of Cloud Resources
4.4 Security Overview
4.5 Cloud Security Challenges
4.6 Software-as-a-Service Security
4.7 Security Governance
4.8 Virtual Machine Security
4.9 IAM
4.10 Security Standards
(4 - 1)
Cloud Computing 4-2 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-3 Resource Management and Security in Cloud
The consequence is that one cannot directly launch SaaS applications on a cloud
platform. The cloud platform for SaaS cannot be built unless there are compute,
storage and network infrastructure are established.
In above architecture, the lower three layers are more closely connected to physical
specifications.
The Hardware as a Service (HaaS) is the lowermost layer which provides various
hardware resources to run cloud services.
The next layer is Infrastructure as a Service that interconnects all hardware elements
using computer, storage and network services.
The next layer has two services namely Network as a Service (NaaS) to bind and
provisioned cloud services over the network and Location as a Service (LaaS) to
provide collocation service to control, and protect all physical hardware and
network resources.
The next layer is Platform as a Service for web application deployment and delivery
while topmost layer is actually used for on demand application delivery.
In any cloud platform, the cloud infrastructure performance is the primary concern for
every cloud service provider while quality of services, service delivery and security are
the concerns for cloud users. Every SaaS application is subdivided into the different
application areas for business applications like CRM is used for sales, promotion, and
marketing services. CRM offered the first SaaS on the cloud successfully. The other tools
may provide distributed collaboration, financial management or human resources
management.
In inter cloud resource provisioning, developers have to consider how to design the
system to meet critical requirements such as high throughput, HA, and fault tolerance.
The infrastructure for operating cloud computing services may be either a physical server
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-4 Resource Management and Security in Cloud
or a virtual server. By using VMs, the platform can be flexible, i.e. running services are
not associated with specific hardware platforms. This adds flexibility to cloud computing
platforms. The software layer at the top of the platform is a layer for storing huge
amounts of data.
Like in the cluster environment, there are some runtime support services accessible in
the cloud computing environment. Cluster monitoring is used to obtain the running state
of the cluster as a whole. The scheduler queues the tasks submitted to the entire cluster
and assigns tasks to the processing nodes according to the availability of the node. The
runtime support system helps to keep the cloud cluster working with high efficiency.
Runtime support is the software needed for browser-initiated applications used by
thousands of cloud customers. The SaaS model offers software solutions as a service,
rather than requiring users to buy software. As a result, there is no initial investment in
servers or software licenses on the customer side. On the provider side, the cost is rather
low compared to the conventional hosting of user applications. Customer data is stored in
a cloud that is either private or publicly hosted by PaaS and IaaS providers.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-5 Resource Management and Security in Cloud
users use virtual machines as a physical host with customized operating systems for
different applications.
For example, Amazon’s EC2 uses Xen as the Virtual Machine Monitor (VMM) which is
also used in IBM’s Blue Cloud. Some VM templates are also supplied on the EC2
platform. From templates, users can select different types of VMs. But no VM templates
are provided by IBM 's Blue Cloud. Any form of VMs may generally be run on the top of
Xen. In its Azure cloud platform, Microsoft also applied virtualization. A resource-
economic services provider should deliver. The increase in energy waste by heat
dissipation from data centers means that power-efficient caching, query processing and
heat management schemes are necessary. Public or private clouds promise to streamline
software, hardware and data as a service, provisioned in order to save on-demand IT
deployment and achieving economies of scale in IT operations.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-6 Resource Management and Security in Cloud
In storage, numerous technologies are available like SCSI, SATA, SSDs, and Flash
storages and so on. In future, hard disk drives with solid-state drives may be used as an
enhancement in storage technologies. It would ensure reliable and high-performance data
storage. The key obstacles to the adoption of flash memory in data centers have been
price, capacity and, to some extent, lack of specialized query processing techniques.
However, this is about to change as the I/O bandwidth of the solid-state drives is
becoming too impressive to overlook.
Databases are very popular for many applications as they used as an underlying storage
container. The size of such a database can be very high for the processing of huge
quantities of data. The main aim is to store data in structured or semi-structured forms so
that application developers can use it easily and construct their applications quickly.
Traditional databases may meet the performance bottleneck while the system is being
extended to a larger scale. However, some real applications do not need such a strong
consistency. The size of these databases can be very growing. Typical cloud databases
include Google’s Big Table, Amazons Simple DB or DynamoDB and Azure SQL service
from Microsoft Azure.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-7 Resource Management and Security in Cloud
technologies use the tools of the distributed grid. The Intergrid assigns and manages a
Distributed Virtual Environment (DVE). It is a cluster of available vms isolated from
other virtual clusters. The DVE Manager component performs resource allocation and
management on behalf of particular user applications. The central component of the IGG
is the schedule for enforcing provisioning policies and peering with several other
gateways. The communication system provides an asynchronous message-passing
mechanism that is managed in parallel by a thread pool.
In demand driven resource provisioning, the resources are allocated as per demand by
the users in dynamic environment. This method adds or eliminates computing instances
depending on the current level of usage of allocated resources. The demand-driven
method automatically allocates two the CPUs to the user application when the user uses
one CPU more than 60 percent of the time for an extended period. In general, when a
resource has met the threshold for a certain amount of time, the system increases the
resource on the basis of demand. If the resource is utilized below the threshold for a
certain amount of time, that resource could be reduced accordingly. This method is
implemented by Amazon web services called as auto-scale feature that runs on its EC2
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-8 Resource Management and Security in Cloud
server. This method is very easy to implement. This approach does not work successfully
if the workload changes abruptly.
In event driven resource provisioning, the resources are allocated whenever an event
generated by the users for at a specific time of interval in dynamic environment. This
method adds or removes machine instances that are based on a specific time event. This
approach works better for seasonal or predicted events when additional resources are
required for shorter time of interval. During these events, the number of users increases
before and decreases after the event period. Decreases over the course of the incident.
This scheme estimates peak traffic before the event happens. This method results in a
small loss of QoS if the occurrence is correctly predicted. Otherwise, its wasted resources
are even larger due to events that do not follow a fixed pattern.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-9 Resource Management and Security in Cloud
cloud data centers to meet cloud customers' QoS targets. Moreover, not a single provider
of cloud infrastructure will be able to set up its data centers, anywhere around the world.
This will make it difficult to meet the QoS standards for all its customers by cloud
applications service (SaaS) providers. They also want to take advantage of the resources
of multiple providers that can best serve their unique needs in cloud infrastructure. In
companies with global businesses and applications such as Internet services, media
hosting and Web 2.0 applications, this form of requirement often arises. This includes the
federation of providers of cloud infrastructure to offer services to multiple cloud
providers. To accomplish it, Intercloud architecture has been proposed to enable
brokerage and the sharing of cloud resources for applications across multiple clouds in
order to scale applications. The generalized Intercloud architecture is shown in Fig. 4.3.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 10 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 11 Resource Management and Security in Cloud
and can demonstrate their auditors' compliance. Lack of trust between service providers
and cloud users has prevented cloud computing from being generally accepted as a
solution for on demand service.
Trust and privacy are also more challenging for web and cloud services as most
desktop and server users have resisted leaving user applications to their cloud provider’s
data center. Some users worry about the lack of privacy, security and copyright
protection on cloud platforms. Trust is not a mere technological question, but a social
problem. However, with a technical approach, the social problem can be solved. The
cloud uses virtual environment that poses new security threats that are harder to manage
than traditional configurations of clients and servers. Therefore, a new data protection
model is needed to solve these problems.
Three basic enforcement of cloud security is expected. First, data center security
facilities require year-round on-site security. There is frequently deployment of biometric
readers, CCTV (Close Circuit), motion detection and man traps are required. The global
firewalls, Intrusion Detection Systems (IDSes) and third-party vulnerability assessments
often required for meeting fault-tolerant network security. Finally, security platform must
acquire SSL transmissions, data encryption, strict password policies and the certification
of the system's trust. As Cloud servers can be either physical or Virtual machines.
Security compliance requires a security-aware cloud architecture that should provide
remedy for malware-based attacks such as worms, viruses and DDoS exploit system
vulnerabilities. These attacks compromise the functionality of the system or provide
unauthorized access to critical information for intruders.
a need to protect infrastructure first. The infrastructure security is the important factor in
cloud security. The cloud composed of network of connected servers called host with
applications deployed on them.
The infrastructure security has three levels security model which is composed of Network
level security, Host level security and Application level security. The three models of
infrastructure security are explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 13 Resource Management and Security in Cloud
cloud provider.so there we need to ensure ensuring data confidentiality and integrity
together.
For example, as per Amazon Web Services (AWS) security vulnerability report the
users have used digital signature algorithms to access Amazon SimpleDB and Amazon
Elastic Compute Cloud (EC2) over HTTP instead of HTTPS. Because of that they did face
an increased risk that their data could have been altered in transit without their
knowledge.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 14 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 16 Resource Management and Security in Cloud
The common types attacks happened at network, host and application levels are
explained in Table 4.1.1.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 17 Resource Management and Security in Cloud
Table 4.4.1 Common types of attacks happened at network, host and application levels
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 18 Resource Management and Security in Cloud
other companies in a cloud environment. In such environment, you may not have
awareness or control of where the resources are running in a shared pool outside the
organization’s boundary. Sharing your data in such environment with other companies
may give the government reasonable reason to seize your assets as other company has
violated the law of compliance. You may put your data at risk of seizure because you
have shared the cloud environment. Most of the times, if you want to switch from one
cloud provider to the other, storage services offered by one cloud vendor may be
incompatible with other platform services. Like Amazon’s “Simple Storage Service” [S3]
is incompatible with IBM’s Blue Cloud, or Dell or Google cloud platform. In Storage
cloud, most clients probably want their data to be encrypted via SSL (Secure Sockets
Layer) across the Internet in both ways. They most probably want to encrypt their data
while it is in the cloud storage pool. Therefore, in cloud, who is controlling the encryption
/ decryption keys when information is encrypted during the cloud ? Is it by the client or
the vendor in the cloud? These are unanswered questions. Therefore, before moving
data to the cloud make sure that the encryption / decryption keys are working and
tested, as when data resides on your own servers.
The integrity of data means making certain that data is maintained identically during
each operation (e.g. transmission, storage or recovery). In other words, data integrity
ensures consistence and correctness of the data. Ensuring the integrity of the information
does mean that it only changes when the transactions are authorized. It sounds good, but
you must remember that there is still no common standard for ensuring data integrity.
The use of SaaS services in the cloud means that software development is much less
necessary. If you plan on using internally developed cloud coding, a formally secure
development software life cycle (SDLC) is even more important. Inadequate use of
mashup technology (web services combinations), which is crucial for cloud applications,
will certainly lead to unknown security vulnerabilities in such applications. A security
model should be integrated into the development tool to guide developers during the
development phase and restrict users to their authorized data only once the system has
been deployed. With increasing number of mission-critical processes moving into the
cloud, SaaS providers will need to provide log information directly and in real time,
probably for their administrators and their customers alike. Someone must take
responsibility of monitoring for security and compliance control. They would not be able
to comply without application and data being tracked by the end users.
As the Payment Card Industry Data Protection Standard (PCI DSS) includes access to
logs, the auditors and regulators may refer to them for auditing a security report. The
security managers must ensure that they obtain access to the logs of the service provider
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 19 Resource Management and Security in Cloud
in the context of any service agreements. Cloud apps are constantly being enhanced by
features and users must remain up-to - date about app improvements to make sure they
are protected. SDLC and security are affected by the speed at which cloud changes in
applications. For example, the SDLC of Microsoft assumes a three to five-year period for
which the mission-critical software won't change substantially, but the cloud may require
a change in the application every couple of weeks. Unfortunately, a secure SLDC cannot
deliver a security cycle that keeps pace with such rapid changes. This means that users
have to update continuously as an older version does not work, or protect the data.
The appropriate fail-over technology is an often-overlooked aspect of securing the
cloud. The company cannot survive if a mission critical application goes offline but may
be survive for non-mission critical applications. Security must shift to device level, so that
businesses can ensure that their data is secured everywhere they go. In cloud computing,
security at the data level is one of the major challenges.
In a cloud world, the majority of compliance requirements do not allow for
enforcement. There are a wide range of IT security and compliance standards that
regulates most business interactions that must be converted to the cloud over time. SaaS
makes it much more difficult for a customer to determine where his data resides in a
network managed by its SaaS provider or a partner of that provider, posing all kinds of
data protection, aggregation and security enforcement concerns. Many regulations of
compliance require that data not be mixed with other data on shared servers or on
databases. Some nation’s government has strict restrictions on what and how long their
citizens can store data. Some regulations on banking require that the financial data from
customers must stay in their countries. Many mobile IT users can have access to business
data & infrastructure without going through the corporate network through cloud-based
applications. It will increase the need for businesses to monitor security between mobile
and cloud-based users. Placing large amounts of confidential information in a global
cloud enables companies to face wide-ranging distributive threats - attacker no longer
have to come and steal data as all this can be found in one "virtual" location. Cloud
virtualization efficiencies require that multi-organisations virtual machines situated
together on the same physical resources. Although the security of traditional data center
remains in place in the cloud environment, the physical separation and hardware security
of virtual machines on the same server cannot protect them from attack. Management
access is via the Internet instead of direct or on-site, monitored and restricted connections
that are in line with the conventional data center model. It raises risk and visibility and
demands strict monitoring for system control modifications and restrictions on access
control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 20 Resource Management and Security in Cloud
The complex and flexible design of the virtual machines would make it hard for the
security to be maintained and auditable. It would be difficult to demonstrate the security
status of a device and detect the location of an unsafe virtual machine. No matter where a
virtual machine is located in virtual environment, intrusion detection and prevention
systems will require malicious activity to be detected on the virtual machine. The
interconnection of several virtual machines increases the attack surface and risk of
compromise between machines. Individual virtual machines and physical servers in the
cloud server environment uses same operating systems along with business and web
applications that makes raising the threat of an attack or malware exploiting
vulnerabilities remotely. Due to the switching between private cloud and public cloud,
virtual machines become vulnerable. A cloud system that is completely or partially
shared would have a greater attacking surface and thus be more at risk than a resource
environment. Operating systems and application files in a virtualized cloud environment
are on a shared physical infrastructure that requires system, file and activity control to
provide confidence and auditable proof to corporate clients with that their resources have
not become compromised or manipulated.
In the cloud computing environment, the organization uses cloud computing
resources where the subscriber is responsible for patching not the cloud computing
provider. Therefore, it is essential to have patch maintenance awareness. Companies are
frequently required to prove that their conformity with security regulations, standards
and auditing practices is consistent, irrespective of the location of the systems on which
data resides. The data is flexible in the cloud environment and can be placed in on-
premises physical servers, on-site virtual machines or outside the premises on virtual
cloud computing services, and auditors and practicing managers may have to reconsider
it. Many companies are likely to rush into cloud computing without serious consideration
of security implications in their efforts to profit from the benefits of cloud computing,
including significant cost savings.
The virtual machines need to protect themselves, essentially moving the perimeter to
the virtual machine itself, in order to create areas of cloud trust. Enterprise perimeter
security is provided through firewalls, segmentation of network, IDS/IPS, monitoring
tools, De-Militarized Zones (DMZs) and security policies associated with them. These
security strategies and policies control the data resides or transits behind the perimeter.
The cloud service provider is responsible for the security and privacy of customer’s data
in the cloud computing environment.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 21 Resource Management and Security in Cloud
a) Compliance issue
The compliance is related to regulatory standards provided for the use of personal
information or data privacy by the country’s laws or legislations. The compliance makes a
restriction on use or share of personally identifiable information by cloud service
providers. The various regulatory standards are available in USA for data privacy like
USA patriot act, HIPAA, GLBA, FISMA etc.
The compliance concern is depends on various factors like applicable laws,
regulations, standards, contractual commitment, privacy requirements etc. for example as
cloud has multitenant environment, the users data is stored across multiple countries,
regions or states where each region, country has their own legislations related to use and
sharing of personal data that makes restriction on usage of such data.
b) Storage issue
In cloud, storage is the biggest issue because as cloud has multitenant environment it
makes multiple copies of user’s data and store them in multiple data centers across
multiple countries. Therefore, user never comes to know where their personal data is
stored and in which country. The storage concern is related to where users’ data is stored.
So, the main concern for user or organization is to find where their data is stored ? Was
it transferred to another datacenter in other country ? What are the privacy standards
enforced by those countries that makes a limitation on transferring personal data ?
c) Retention issue
The retention issue is related to duration for which the personal data is kept in storage
with retention policies. Each Cloud Service Provider (CSP) has their own set of retention
policies that governs the data.so user or organization has to look at retention policy used
by CSP with their exceptions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 22 Resource Management and Security in Cloud
d) Access issue
The access issue is related to organizations ability to provide individual with access to
personal information and to comply stated request. The user or organization have right to
know what personal data is kept in cloud and can make request to CSP to stop processing
it or delete it from the cloud.
f) Destruction of data
At the end of retention period CSPs used to destroy PII. So the concern here is
organizations never comes to know whether their data or PII on cloud is destroyed by
CSP or not or they have kept additional copies or they just make it inaccessible to
organization.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 23 Resource Management and Security in Cloud
The survey firm Gartner lists seven security problems, which should be discussed with
a cloud computing provider.
Data location : Is it possible for the provider to check data location ?
Data segregation : Ensure that encryption is effective at all times and such this
encryption schemes are designed and tested by qualified experts.
Recovery : In the event of a disaster, find out what will happen with data. Are
service providers are offering full restoration ? If so, how much time does it take ?
Privileged user access : Find out who has sophisticated access to data and how such
administers are hired and managed.
Regulatory compliance : Ensure the vendor is ready to be audited externally
and/or certified for security.
Long-term viability : What if the company goes out of business, and what will
happen with the data ? How and in what format will the data be restored ?
Investigative support : Is the vendor able to investigate any inappropriate or illegal
activity ?
It is now more difficult to assess data protection, meaning that data security roles are
more critical than in past years. The data must be encrypted yourself as a tactic besides
the Gartner's report. If you encrypt the data using a trustworthy algorithm, then the data
will only be available with the decryption key, regardless of the security and encryption
policies of the service provider. This leads to a further problem, of course: How can you
manage private keys in a computing infrastructure with pay on demand ? SaaS suppliers
will have to incorporate and enhance security practices that managed service providers
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 24 Resource Management and Security in Cloud
provide and develop new practices as the cloud environment evolves in order to deal
with the above security issues along with those mentioned earlier. A structured
agreement for security organizations and the initiative is one of the most critical activities
for a security team. This will foster a shared view of what the security leadership is and
aims to achieve which will encourage 'ownership' for the group's success.
1. Risk Assessment
The risk assessment of security is crucial for helping the information security
organization to make informed decisions to balance business utility dueling security goals
and asset protection. Failure to carry out formal risk evaluations can contribute to
increase in information. Security Audit observations can compromise certification goals,
leading to ineffective, inefficient collection of security checks that cannot mitigate security
risks adequately to an appropriate level of information. The structured risk management
process for information security will proactively identify, prepare and manage risks for
security issues on a daily or on a required basis. Applications and infrastructure will also
provide further and more comprehensive technical risk assessments in the context of
threat-modeling. This can assist product management and engineering groups to be more
proactive with the design, test and collaboration with the internal security team. The
modeling of threats requires both IT and business processes and technical knowledge
about the workings of the applications or systems under review.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 25 Resource Management and Security in Cloud
2. Risk Management
The identification of technological assets; identification of data with its connections to
processes, applications, and storage of data; and assignment of ownership with custodial
responsibilities are part of effective risk management. The risk management measures
will also involve maintaining an information asset repository. Owners have the
responsibility and privileges to ensure the confidentiality, integrity, availability and
privacy of information assets, including protective requirements. A formal risk
assessment process for allocating security resources related to business continuity must
be developed.
4. Security Awareness
Security awareness and culture are among the few successful methods for handling
human risks in security. The failure to provide people with adequate knowledge and
training may expose the organization to a number of security risks which threats and
entry points for persons instead of systems or application vulnerabilities. The risk caused
by the lack of an effective security awareness program can leads to Social Engineering
attacks, lower reputation, slumping responses to potential security incidents, and the
inadvertent customer data leakage. The unique approach to security awareness is not
necessarily the right approach for all SaaS organizations; an information security
awareness and training program that adapts the information and training to the person's
role in the organisations, is more important. For example, development engineers can
receive security awareness in the form of a Secure Code and Testing Training while data
privacy and security certification training can be provided to customer service
representatives. An ideal approach should be used for both generic and personal
purposes.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 26 Resource Management and Security in Cloud
project management that result in a project not being completed and never realizing its
expected returns. There are excessive and unrealistic workloads expectations occur
because projects are not prioritized in accordance with policy, goals and resort ability.
The security team should ensure that the project plan and project manager with
appropriate training with experience are in place for each new project being conducted by
a security teams so that the project can be seen through to completion. The development
of methodology, tools and processes that support the expected complexity of projects for
both traditional business practice and cloud-based approach can be enhanced by portfolio
and project management capabilities.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 27 Resource Management and Security in Cloud
recover from failure, but can also reduce the overall complexity, costs and risk of
managing your most critical applications on regular basis. There are also drastic prospects
in the cloud for cost-effective BC/DR solutions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 29 Resource Management and Security in Cloud
The Secure Software Development Life Cycle consists of six phases which are shown
in Fig. 4.7.1 and described as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 30 Resource Management and Security in Cloud
I. Initial Investigation : To define and document project processes and goals in the
security policy of the program.
II. Requirement Analysis : To analyze recent security policies and systems,
assessment of emerging threats and controls, study of legal issues, and perform
risk analysis.
III. Logical design : To develop a security plan; planning for incident response
measures; business responses to disaster; and determine whether the project can
be carried on and/or outsourced.
IV. Physical design : Selecting technologies to support the security plan, developing a
solution definition that is successful, designing physical security measures to
support technological solutions and reviewing and approving plans.
V. Implementation : Purchase or create solutions for security. Submit a tested
management package for approval at the end of this stage.
VI. Maintenance : The application code can be monitored, tested, and maintained
continuously for efficient enhancement. Further additional security processes
have been developed in order to support the development of application projects
such as external and internal penetration testing and standard security
requirements for data classification. Formally training and communication should
also be introduced to raise awareness about process improvement.
4.9 IAM
The Identity and Access Management (IAM) are the vital function for every
organisations, and SaaS customers have a fundamental expectation that their data is given
the principle of the least privilege. The privilege principle says that only the minimum
access is required to perform an operation that should be granted access only for the
minimum amount of time required. Aspects of current models including trust principles,
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 32 Resource Management and Security in Cloud
The IAM architecture is made up of several processes and activities (see Fig. 4.9.2). The
processes supported by IAM are given as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 33 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 34 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 35 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 37 Resource Management and Security in Cloud
TLS authentication is a single way, since the client knows the identity of the server
already. The client is not authenticated in this case. This means that on browser level, the
browser specifically validated the server’s certificate and checked the digital signatures of
the server certificate issuing chain of Certification Authorities (CAs). No validation
identifies the end user's server. The end user must verify the identifying information
contained in the certificate of the server in order to be truly identifiable. That is the only
way for end users to be aware of the server's "identity," and it is the only way to securely
establish the identify, to check that the server's certificate specifies the URL, name or
address that they are using in the server's certificate. The valid certificate of another
website cannot be used by malicious websites, because they have no means of encrypting
the transmission in a way to decrypt it with a true certificate. Since only a trustworthy CA
can incorporate a URL in a certificate, this makes sure that it is appropriate to compare
the apparent URL with the URL specified in the certificate. A more secure bilateral
connection mode is also supported by TLS ensuring that both ends of the connection
communicate with the individual they believe is connected. This is called mutual
authentication. The TLS client side must also keep a certificate for mutual authentication.
Three basic phases involve TLS are Algorithm support for pair negotiation involves
cipher suites that are negotiated between the client and the server to determine the
ciphers being used; Authentication and key exchange involves decisions on
authentication algorithms and key exchange to be used. Here key exchange and
authentication algorithms are public key algorithms; and Message authentication using
Symmetric cipher encryption determines the message authentication codes. The
Cryptographic hash functions are used for message authentication codes. Once these
decisions are made, the transfer of data can be commenced.
Summary
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 38 Resource Management and Security in Cloud
The provisioning of storage resources in cloud is often associated with the terms
like distributed file system, storage technologies and databases.
There are three methods of resource provisioning namely Demand-Driven,
Event-Driven and Popularity-Driven.
The cloud providers can expand or redimension their provision capacity in a
competitive and dynamic manner by leasing the computation and storage
resources of other cloud service providers with the use of Intercloud
architectural principles.
Although cloud computing has many benefits in most of the aspects, but
security issues in cloud platforms led many companies to hesitate to migrate
their essential resources to the cloud.
Even if cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is
important to resolve increased security threats in order to fully benefit from this
new computing paradigms.
Some security issues in cloud platforms are trust, privacy, lack of security and
copyright protection.
Key privacy issues in the cloud computing are Compliance issue, Storage
concern, Retention concern, Access Concern, Auditing and monitoring and so
on.
The lack of a formalized strategy can lead to the development of an
unsupportable operating model and the level of security.
The essential factors required in security governance are Risk Assessment and
management, Security Awareness, Security Portfolio Management, Security
Standards, Guidelines and Policies, Security Monitoring and Incident Response,
Business Continuity Plan and Disaster Recovery and so on.
To overcome the security attacks on VMs, Network level IDS or Hardware level
IDS can be used for protection, shepherding programs can be applied for code
execution control and verification and additional security technologies can be
used.
The Identity and access management is the security framework composed of
policy and governance components used for creation, maintenance and
termination of digital identities with controlled access of shared resources. It
composed of multiple processes, components, services and standard practices.
Security standards are needed to define the processes, measures and practices
required to implement the security program in a web or network environment.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 39 Resource Management and Security in Cloud
Q.1 List any four host security threats in public IaaS. AU : Dec.-17
Ans. : The most common host security threats in public IaaS public cloud are
Hijacking of accounts those are not properly secured.
Stealing the keys like SSH private keys those are used to access and manage hosts.
Attacking unpatched and vulnerable services by listening on standard ports like FTP,
NetBIOS, SSH.
Attacking systems that are not secured by host firewalls.
Deploying Trojans embedded viruses in the software’s running inside the VM.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 41 Resource Management and Security in Cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 42 Resource Management and Security in Cloud
Q.1 “In today’s world, infrastructure security and data security are highly challenging at
network, host and application levels”, Justify and explain the several ways of protecting the
data at transit and at rest. AU : May-18
Ans. : Refer section 4.4.1 to 4.4.4.
Q.2 Explain the baseline Identity and access management (IAM) factors to be practices by
the stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment. AU : May-18
Ans. : Refer section 4.9 for Identity and access management and 4.5.1 for the common
key privacy issues likely to happen in the environment.
Q.3 What is the purpose of IAM ? Describe its functional architecture with an illustration.
AU : Dec.-17
Ans. : Refer section 4.9.
Q.4 Write details about cloud security infrastructure. AU : Dec.-16
Ans. : Refer section 4.4.
Q.5 Write detailed note on identity and access management architecture. AU : May-17
Ans. : Refer section 4.9.
Q.6 Describe the IAM practices in SaaS, PaaS and IaaS availability in cloud. AU : Dec.-19
Ans. : Refer section 4.9.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 43 Resource Management and Security in Cloud
Q.7 How is the identity and access management established in cloud to counter threats ?
AU : May-19
Ans. : Refer section 4.9.
Q.8 Write detailed note on Resource Provisioning and Resource Provisioning Methods
Ans. : Refer section 4.2.
Q.9 How Security Governance can be achieved in cloud computing environment
Ans. : Refer section 4.7.
Q.10 Explain different Security Standards used in cloud computing
Ans. : Refer section 4.10.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
5
Cloud Technologies and
Advancements
Syllabus
Hadoop – MapReduce – Virtual Box -- Google App Engine – Programming Environment for
Google App Engine - Open Stack – Federation in the Cloud – Four Levels of Federation –
Federated Services and Applications – Future of Federation.
Contents
5.1 Hadoop
5.2 Hadoop Distributed File system (HDFS)
5.3 Map Reduce
5.4 Virtual Box
5.5 Google App Engine
5.6 Programming Environment for Google App Engine
5.7 Open Stack
5.8 Federation in the Cloud
5.9 Four Levels of Federation
5.10 Federated Services and Applications
5.11 The Future of Federation
(5 - 1)
Cloud Computing 5-2 Cloud Technologies and Advancements
5.1 Hadoop
With the evolution of internet and related technologies, the high computational power,
large volumes of data storage and faster data processing becomes the basic need for most
of the organizations and it has been significantly increased over the period of time.
Currently, organizations are producing huge amount of data at faster rate. The recent
survey on data generation by various organization says that Facebook produces roughly
600+ TBs of data per day and analyzes 30+ Petabytes of user generated data, Boeing jet
airplane generates more than 10 TBs of data per flight including geo maps, special images
and other information, Walmart handles more than 1 million customer transactions every
hour, which is imported into databases estimated to contain more than 2.5 petabytes of
data.
So, there is a need to acquire, analyze, process, handle and store such a huge amount
of data called big data. The different challenges associated with such big data are given as
below
a) Volume : The Volume is related to Size of big data. The amount of data growing
day by day is very huge. According to IBM, in the year 2000, 8 lakh petabytes of
data were stored in the world.so challenge here is, how to deal with such huge Big
Data.
b) Variety : The Variety is related to different formats of big data. Nowadays most of
the data stored by organizations have no proper structure called unstructured data.
Such data has complex structure and cannot be represented using rows and
columns. The challenge here is how to store different formats of data in databases.
c) Velocity : The Velocity is related to speed of data generation which is very fast. It is
a rate at which data is captured, generated and shared. The challenge here is how to
react to massive information generated in the time required by the application.
d) Veracity : The Veracity refers to uncertainty of data. The data stored in database
sometimes is not accurate or consistent that makes poor data quality. The
inconsistent data requires lot of efforts to process such data.
The traditional database management techniques are incapable to satisfy above four
characteristics as well as doesn’t supports storing, processing, handling and analyzing big
data. Therefore, these challenges associated with Big data can be solved using one of the
most popular framework provided by Apache is called Hadoop.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-3 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-4 Cloud Technologies and Advancements
5) Pig It is a platform used for analyzing the large data sets using a
high-level language. It uses dataflow language and provides
parallel execution framework.
As we know that from all above components, HDFS and MapReduce are the two core
components of Hadoop framework which are explained in next sections.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-5 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-6 Cloud Technologies and Advancements
1. Name Node
An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system, while
the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-7 Cloud Technologies and Advancements
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block creation,
deletion and replication upon instruction from name node. The data node stores each
HDFS data block in separate file and several blocks are stored on different data nodes.
The requirement of such a block structured file system is to store, manage and access files
metadata reliably.
The representation of name node and data node is shown in Fig. 5.2.2.
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user references
files and directories by paths in the namespace. The user application does not need to
aware that file system metadata and storage are on different servers, or that blocks have
multiple replicas. When an application reads a file, the HDFS client first asks the name
node for the list of data nodes that host replicas of the blocks of the file. The client
contacts a data node directly and requests the transfer of the desired block. When a client
writes, it first asks the name node to choose data nodes to host replicas of the first block of
the file. The client organizes a pipeline from node-to-node and sends the data. When the
first block is filled, the client requests new data nodes to be chosen to host replicas of the
next block. The Choice of data nodes for each block is likely to be different.
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system are
divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
The HDFS is fault tolerance such that if data node fails then current block write
operation on data node is re-replicated to some other node. The block size, number of
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-8 Cloud Technologies and Advancements
replicas and replication factors are specified in Hadoop configuration file. The
synchronization between name node and data node is done by heartbeats functions
which are periodically generated by data node to name node.
Apart from above components the job tracker and task trackers are used when
MapReduce application runs over the HDFS. Hadoop core consists of one master job
tracker and several task trackers. The job tracker runs on name node like a master while
task trackers runs on data nodes like slaves.
The job tracker is responsible for taking the requests from a client and assigning task
trackers to it with tasks to be performed. The job tracker always tries to assign tasks to
the task tracker on the data nodes where the data is locally present. If for some reason the
node fails the job tracker assigns the task to another task tracker where the replica of the
data exists since the data blocks are replicated across the data nodes. This ensures that the
job does not fail even if a node fails within the cluster.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-9 Cloud Technologies and Advancements
Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.3.2 and explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 10 Cloud Technologies and Advancements
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept on
HDFS (Hadoop Distributed File System) store which has standard InputFormat specified
by user.
Once input file is selected then the split phase reads the input data and divided those
in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs. It reads
input data from split using record reader and generates intermediate results. It is used to
transform the input key, value list data to output key, value list which is then pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. Each partition is called subset. Each
subset becomes input to the reducer.in general shuffle phase ensures that the partitioned
splits reached at appropriate reducers where reducer uses http protocol to retrieve their
own partition from mapper.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 11 Cloud Technologies and Advancements
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases occur
simultaneously where mapped output are being fetched and merged.
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format.
The final output of each MapReduce program is generated with key value pairs written in
output file which is written back to the HDFS store. In example of Word count process
using MapReduce with all phases of execution are illustrated in Fig. 5.3.3.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 13 Cloud Technologies and Advancements
Fig. 5.5.1 : Functional architecture of the Google cloud platform for app engine
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 14 Cloud Technologies and Advancements
The infrastructure for google cloud is managed inside datacenter. All the cloud
services and applications on Google runs through servers inside datacenter. Inside each
data center, there are thousands of servers forming different clusters. Each cluster can run
multipurpose servers. The infrastructure for GAE composed of four main components
like Google File System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for
storing large amounts of data on google storage clusters. The MapReduce is used for
application program development with data processing on large clusters. Chubby is used
as a distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data. In this architecture, users can interact
with Google applications via the web interface provided by each application.
The GAE platform comprises five main components like
Application runtime environment offers a platform that has built-in execution
engine for scalable web programming and execution.
Software Development Kit (SDK) for local application development and
deployment over google cloud platform.
Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based
on BigTable techniques.
Admin console used for easy management of user application development and
resource management
GAE web service for providing APIs and interfaces.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 15 Cloud Technologies and Advancements
GFS provides a file system interface and different APIs for supporting different file
operations such as create to create a new file instance, delete to delete a file instance, open to
open a named file and return a handle, close to close a given file specified by a handle,
read to read data from a specified file and write to write data to a specified file.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 16 Cloud Technologies and Advancements
It can be seen from Figure 5.6.1, that a single GFS Master and three chunk servers are
serving to two clients comprise a GFS cluster. These clients and servers, as well as the
Master, are Linux machines, each running a server process at the user level. These
processes are known as user-level server processes.
In GFS, the metadata is managed by the GFS Master that takes care of all the
communication between the clients and the chunk servers. Chunks are small blocks of
data that are created from the system files. Their usual size is 64 MB. The clients interact
directly with chunk servers for transferring chunks of data. For better reliability, these
chunks are replicated across three machines so that whenever the data is required, it can
be obtained in its complete form from at least one machine. By default, GFS stores three
replicas of the chunks of data. However, users can designate any levels of replication.
Chunks are created by dividing the files into fixed-sized blocks. A unique immutable
handle (of 64-bit) is assigned to each chunk at the time of their creation by the GFS
Master. The data that can be obtained from the chunks, the selection of which is specified
by the unique handles, is read or written on local disks by the chunk servers. GFS has all
the familiar system interfaces. It also has additional interfaces in the form of snapshots
and appends operations. These two features are responsible for creating a copy of files or
folder structure at low costs and for permitting a guaranteed atomic data-append
operation to be performed by multiple clients of the same file concurrently.
Applications contain a specific file system, Application Programming Interface (APIs)
that are executed by the code that is written for the GFS client. Further, the
communication with the GFS Master and chunk servers are established for performing
the read and write operations on behalf of the application. The clients interact with the
Master only for metadata operations. However, data-bearing communications are
forwarded directly to chunk servers. POSIX API, a feature that is common to most of the
popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-in is
not required. Clients or servers do not perform the caching of file data. Due to the
presence of the streamed workload, caching does not benefit clients, whereas caching by
servers has the least consequence as a buffer cache that already maintains a record for
frequently requested files locally.
The GFS provides the following features :
Large - scale data processing and storage support
Normal treatment for components that stop responding
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 17 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 18 Cloud Technologies and Advancements
It is composed of three entities, namely Client, Big table master and Tablet servers. Big
tables are implemented over one or more clusters that are similar to GFS clusters. The
client application uses libraries to execute Big table queries on the master server. Big table
is initially broken up into one or more slave servers called tablets for the execution of
secondary tasks. Each tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage
collections and monitoring the performance of tablet servers. The master server splits
tasks and executes them over tablet servers. The master server is also responsible for
maintaining a centralized view of the system to support optimal placement and load-
balancing decisions. It performs separate control and data operations strictly with tablet
servers. Upon granting the tasks, tablet servers provide row access to clients. Fig. 5.6.3
shows the structure of Big table :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 19 Cloud Technologies and Advancements
Big Table is arranged as a sorted map that is spread in multiple dimensions and
involves sparse, distributed, and persistence features. The Big Table’s data model
primarily combines three dimensions, namely row, column, and timestamp. The first two
dimensions are string types, whereas the time dimension is taken as a 64-bit integer. The
resulting combination of these dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a
lexicological form. Although Big Table rows do not support the relational model, they
offer atomic access to the data, which means you can access only one record at a time. The
rows contain a large amount of data about a given entity such as a web page. The row
keys represent URLs that contain information about the resources that are referenced by
the URLs.
The naming conventions that are used for columns are more structured than those of
rows. Columns are organized into a number of column families that logically groups the
data under a family of the same type. Individual columns are designated by qualifiers
within families. In other words, a given column is referred to use the syntax column_
family: optional_ qualifier, where column_ family is a printable string and qualifier is an
arbitrary string. It is necessary to provide an arbitrary name to one level which is known
as a column family, but it is not mandatory to give a name to a qualifier. The column
family contains information about the data type and is actually the unit of access control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 20 Cloud Technologies and Advancements
Qualifiers are used for assigning columns in each row. The number of columns that can
be assigned in a row is not restricted.
The other important dimension that is assigned to Big Table is a timestamp. In Big
table, the multiple versions of data are indexed by timestamp for a given cell. The
timestamp is either related to real-time or can be an arbitrary value that is assigned by a
programmer. It is used for storing various data versions in a cell. By default, any new
data that is inserted into Big Table is taken as current, but you can explicitly set the
timestamp for any new write operation in Big Table. Timestamps provide the Big Table
lookup option that returns the specified number of the most recent values. It can be used
for marking the attributes of the column families. The attributes either retain the most
recent values in a specified number or keep the values for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of
operations such as metadata operations, read/write operations, or modify/update
operations. The commonly used operations by APIs are as follows:
Creation and deletion of tables
Creation and deletion of column families within tables
Writing or deleting cell values
Accessing data from rows
Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
Set () is used for writing cells in a row.
DeleteCells () is used for deleting cells from a row.
DeleteRow() is used for deleting the entire row, i.e., all the cells from a row are
deleted.
It is clear that Big Table is a highly reliable, efficient, and fan system that can be used
for storing different types of semi-structured or unstructured data by users.
5.6.3 Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities in
an asynchronous environment on a large scale. It is used as a name service within Google
and provides reliable storage for file systems along with the election of coordinator for
multiple replicas. The Chubby interface is similar to the interfaces that are provided by
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 21 Cloud Technologies and Advancements
distributed systems with advisory locks. However, the aim of designing Chubby is to
provide reliable storage with consistent availability. It is designed to use with loosely
coupled distributed systems that are connected in a high-speed network and contain
several small-sized machines. The lock service enables the synchronization of the
activities of clients and permits the clients to reach a consensus about the environment in
which they are placed. Chubby’s main aim is to efficiently handle a large set of clients by
providing them a highly reliable and available system. Its other important characteristics
that include throughput and storage capacity are secondary. Fig. 5.6.4 shows the typical
structure of a Chubby system :
The chubby architecture involves two primary components, namely server and client
library. Both the components communicate through a Remote Procedure Call (RPC).
However, the library has a special purpose, i.e., linking the clients against the chubby cell.
A Chubby cell contains a small set of servers. The servers are also called replicas, and
usually, five servers are used in every cell. The Master is elected from the five replicas
through a distributed protocol that is used for consensus. Most of the replicas must vote
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 22 Cloud Technologies and Advancements
for the Master with the assurance that no other Master will be elected by replicas that
have once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is
simpler than the Unix one. The files and directories, known as nodes, are contained in the
Chubby namespace. Each node is associated with different types of metadata. The nodes
are opened to obtain the Unix file descriptors known as handles. The specifiers for
handles include check digits for preventing the guess handle for clients, handle sequence
numbers, and mode information for recreating the lock state when the Master changes.
Reader and writer locks are implemented by Chubby using files and directories. While
exclusive permission for a lock in the writer mode can be obtained by a single client, there
can be any number of clients who share a lock in the reader’s mode. The nature of locks is
advisory, and a conflict occurs only when the same lock is requested again for an
acquisition. The distributed locking mode is complex. On one hand, its use is costly, and
on the other hand, it only permits numbering the interactions that are already using locks.
The status of locks after they are acquired can be described using specific descriptor
strings called sequencers. The sequencers are requested by locks and passed by clients to
servers in order to progress with protection.
Another important term that is used with Chubby is an event that can be subscribed
by clients after the creation of handles. An event is delivered when the action that
corresponds to it is completed. An event can be :
a. Modification in the contents of a file
b. Addition, removal, or modification of a child node
c. Failing over of the Chubby Master
d. Invalidity of a handle
e. Acquisition of lock by others
f. Request for a conflicting lock from another client
In Chubby, caching is done by a client that stores file data and metadata to reduce the
traffic for the reader lock. Although there is a possibility for caching of handles and files
locks, the Master maintains a list of clients that may be cached. The clients, due to
caching, find data to be consistent. If this is not the case, an error is flagged. Chubby
maintains sessions between clients and servers with the help of a keep-alive message,
which is required every few seconds to remind the system that the session is still active.
Handles that are held by clients are released by the server in case the session is overdue
for any reason. If the Master responds late to a keep-alive message, as the case may be, at
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 23 Cloud Technologies and Advancements
times, a client has its own timeout (which is longer than the server timeout) for the
detection of the server failure.
If the server failure has indeed occurred, the Master does not respond to a client about
the keep-alive message in the local lease timeout. This incident sends the session in
jeopardy. It can be recovered in a manner as explained in the following points:
The cache needs to be cleared.
The client needs to wait for a grace period, which is about 45 seconds.
Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy
is over. However, if this attempt fails, the client assumes that the session is lost. Fig. 5.6.5
shows the case of the failure of the Master :
Chubby offers a decent level of scalability, which means that there can be any
(unspecified) number of the Chubby cells. If these cells are fed with heavy loads, the lease
timeout increases. This increment can be anything between 12 seconds and 60 seconds.
The data is fed in a small package and held in the Random-Access Memory (RAM) only.
The Chubby system also uses partitioning mechanisms to divide data into smaller
packages. All of its excellent services and applications included, Chubby has proved to
be a great innovation when it comes to storage, locking, and program support services.
The Chubby is implemented using the following APls :
1. Creation of handles using the open() method
2. Destruction of handles using the close() method
The other important methods include GetContentsAndStat(), GetStat(), ReadDir(),
SetContents(), SetACl(), Delete(), Acquire(), TryAcquire(), Release(), GetSequencer(),
SetSequencer(), and CheckSequencer(). The commonly used APIs in chubby are listed in
Table 5.6.1 :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 24 Cloud Technologies and Advancements
API Description
Open Opens the file or directory and returns a handle
Close Closes the file or directory and returns the associated handle
GetContentsAndStat Writes the file contents and return metadata associated with the file
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 25 Cloud Technologies and Advancements
Swift : Swift provides storage services for storing files and objects. Swift can be
equated with Amazon’s Simple Storage System (S3).
Cinder : This component provides block storage to Nova Virtual Machines. Its
working is similar to a traditional computer storage system where the computer is
able to access specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
Glance : Glace is OpenStack’s image service component that provides virtual
templates (images) of hard disks. These templates can be used for new VMs. Glance
may use either Swift or flat files to store these templates.
Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It
also ensures communication between other components.
Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in
files.
Keystone : This component provides identity management in OpenStack
Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
Ceilometer : This component of OpenStack provisions meters and billing models
for users of the cloud services. It also keeps an account of the resources used by
each individual user of the OpenStack cloud. Let us also discuss some of the non-
core components of OpenStack and their offerings.
Trove : Trove is a component of OpenStack that provides Database-as-a- service. It
provisions relational databases and big data engines.
Sahara : This component provisions Hadoop to enable the management of data
processors.
Zaqar : This component allows messaging between distributed application
components.
Ironic : Ironic provisions bare-metals, which can be used as a substitute to VMs.
The basic architectural components of OpenStack, shown in Fig. 5.7.1, includes its core
and optional services/ components. The optional services of OpenStack are also known as
Big Tent services, and OpenStack can be used without these components or they can be
used as per requirement.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 27 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 28 Cloud Technologies and Advancements
Compatibility : OpenStack supports both private and public clouds and is very easy
to deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling
easy portability between public and private clouds.
Security : OpenStack addresses the security concerns, which are the top- most
concerns for most organisations, by providing robust and reliable security systems.
Real-time Visibility : OpenStack provides real-time client visibility to
administrators, including visibility of resources and instances, thus enabling
administrators and providers to track what clients are requesting for.
Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems,
which resulted in loss of performance. Now, OpenStack has enabled upgrading
systems while they are running by requiring only individual components to shut-
down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 29 Cloud Technologies and Advancements
From Fig. 5.7.2, we can see that every service of OpenStack depends on other services
within the systems, and all these services exist in a single ecosystem working together to
produce a virtual machine. Any service can be turned on or off depending on the VM
required to be produced. These services communicate with each other through APIs and
in some cases through privileged admin commands.
Let us now discuss the relationship between various components or services specified
in the conceptual architecture of OpenStack. As you can see in Figure 4.2, three
components, Keystone, Ceilometer and Horizon, are shown on top of the OpenStack
platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to
the user by mapping the central directory of users to the accessible OpenStack services,
and Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack
platform, you can see that various processes are handled by different OpenStack services;
Glance is registering Hadoop images, providing image services to OpenStack and
allowing retrieval and storage of disk images. Glance stores the images in Swift, which is
responsible for providing reading service and storing data in the form of objects and files.
All other OpenStack components also store data in Swift, which also stores data or job
binaries. Cinder, which offers permanent block storage or volumes to VMs, also stores
backup volumes in Swift. Trove stores backup databases in Swift and boots databases
instances via Nova, which is the main computing engine that provides and manages virtual
machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic
that fetches images via Glance. VMs are used by the users or administrators to avail and
provide the benefits of cloud services. All the OpenStack services are used by VMs in
order to provide best services to the users. The infrastructure required for running cloud
services is managed by Heat, which is the orchestration component of OpenStack that
orchestrates clusters and stores the necessarys resource requirements of a cloud
application. Here, Sahara is used to offer a simple means of providing a data processing
framework to the cloud users.
Table 5.7.1 shows the dependencies of these services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 30 Cloud Technologies and Advancements
Keystone (Identity) - -
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 31 Cloud Technologies and Advancements
1. Jabber XCP
Instant Messaging (IM) allows users to exchange messages that are delivered
synchronously. As long as the recipient is connected to the service, the message will be
pushed to it directly. This can either be realized using a centralized server or peer to peer
connections between each client.
The Jabber Extensible Communications Platform (Jabber XCP) is a commercial IM
server, created by the Cisco in association with Sun Microsystems. It is a highly
programmable presence and messaging platform. It supports the exchange of
information between applications in real time. It supports multiple protocols such as
Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions
(SIMPLE) and Instant Messaging and Presence Service (IMPS). It is a highly
programmable platform and scalable solution, which makes it ideal for adding presence
and messaging to existing applications or services and for building next-generation,
presence - based solutions.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 32 Cloud Technologies and Advancements
In current scenario, XMPP and XCP are extensively used for federation in the cloud
due to its unique capabilities which were never before. The next sections of this chapter
explain the levels of federations along with their applications and services.
The verified federation works at level 2, runs above the permissive federation. In
this level, the server accepts a connection from a peer network server only when the
identity of peer is verified or validated. Peer verification is the minimum criteria to run
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 34 Cloud Technologies and Advancements
encryption. The trusted root CAs are identified based on one or more factors like their
operating system environment, XMPP server software, or local service policy. The
utilization of trusted domain certificates prevents DNS poisoning attacks but makes
federation more difficult. Under such circumstances, the certificates are difficult to obtain.
Here certificates are signed by CA.
intermediate links. With federation, the email network can ensures encrypted connections
and strong authentication because of using certificates issued by trusted root CAs.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using
programming models.
The Hadoop core is divided into two fundamental layers called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data
storage manager.
The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves.
In MapReduce s model the data processing primitives used are called mapper
and reducer. The mapper has map method that transforms input key value pair
in to any number of intermediate key value pairs while reducer has a reduce
method that transform intermediate key value pairs that aggregated in to any
number of output key, value pairs.
VirtualBox is a Type II (hosted) hypervisor that runs on Microsoft Windows,
Mac OS, Linux, and Solaris systems. It is ideal for testing, developing,
demonstrating, and deploying solutions across multiple platforms on single
machine using VirtualBox.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 37 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 39 Cloud Technologies and Advancements
If after a certain time of heartbeat (which is ten minutes by default), Name Node
does not receive any response from Data Node, then that particular Data Node used to
be declared as dead. If the death of a node causes the replication factor of data blocks to
drop below their minimum value, the Name Node initiates additional replication to
normalized state.
Q.3 Name the different modules in Hadoop framework. AU : May-17
Ans. : The Hadoop core is divided into two fundamental modules called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data storage
manager. Apart from that there are several other modules in Hadoop, used for data
storage, processing and analysis which are listed below :
a. HBase : Column-oriented NOSQL database service
b. Pig : Dataflow language and provides parallel data processing framework
c. Hive : Data warehouse infrastructure for big data
d. Scoop : Transferring bulk data between Hadoop and structured data stores
e. Oozie : Workflow scheduler system
f. Zookeeper : Distributed synchronization and coordination service
g. Mahaout : Machine learning tool for big data.
Q.4 “HDFS” is fault tolerant. Is it true ? Justify your answer. AU : Dec.-17
Ans. : Fault tolerance refers to the ability of the system to work or operate
uninterrupted even in case of unfavorable conditions (like components failure due to
disaster or by any other reason). The main purpose of this fault tolerance is to remove
frequently taking place failures, which occurs commonly and disturbs the ordinary
functioning of the system. The three main solutions which are used to produce fault
tolerance in HDFS are data replication, heartbeat messages and checkpoint and
recovery.
In data replication, The HDFS stores multiple replicas of same data across different
clusters based on replication factor. HDFS uses an intelligent replica placement model
for reliability and performance. The same copy of data is positioned on several different
computing nodes so when that data copy is needed it is provided by any of the data
node. major advantage of using this technique is to provide instant recovery from node
and data failures. But one main disadvantage is it consume high memory in storing the
same data on multiple nodes.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 40 Cloud Technologies and Advancements
In heartbeat messages, the message is sent by the data node to the name node in
regular time interval to indicate its presence, i.e. to indicate that it is alive. If after a
certain time of heartbeat, Name Node does not receive any response from Data Node,
then that particular Data Node used to be declared as dead. In that case, the replication
node is considered as a primary data node to recover the data.
In checkpoint and recovery, similar concept as that of rollback is used to tolerate
faults up to some point. After a fixed length of time interval, the copy report has been
saved and stored. It just rollbacks to the last save point when the failure occurs and
then it starts performing transaction again.
Q.5 How does divide and conquer strategy related to MapReduce paradigm ?
AU : May-18
Ans. : In divide and conquer strategy a computational problem is divided into smaller
parts and execute them independently
until all parts gets completed and then
combining them to get a desired
solution of that problem.
The MapReduce takes a set of input
<key, value> pairs and produces a set
of output <key, value> pairs by
supplying data through map and
Fig. 5.2 : MapReduce operations
reduce functions. The typical
MapReduce operations are shown in Fig. 5.2.
In MapReduce, Mapper uses divide approach where input data gets splitted into
blocks; each block is represented as an input key and value pair. The unit of work in
MapReduce is a job. During map phase the input data is divided in to input splits for
analysis where each split is an independent task. These tasks run in parallel across
Hadoop clusters. A map function is applied to each input key/value pair, which does
some user-defined processing and emits new key/value pairs to intermediate storage
to be processed by the reducer. The reducer uses conquer approach for combining the
results. The reducer phase uses result obtained from mapper as an input to generate the
final result. A reduce function is applied on to mappers output in parallel to all values
corresponding to each unique map key and generates a single output key/value pair.
Q.6 How MapReduce framework executes user jobs ? AU : Dec.-18
Ans. : The unit of work in MapReduce is a job. During map phase the input data is
divided in to input splits for analysis where each split is an independent task. These
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 41 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 42 Cloud Technologies and Advancements
Features of MapReduce
The different features provided by MapReduce are explained as follows :
Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The
synchronization is provided by reading the state of each MapReduce operation
during the execution and uses shared variables for those.
Data locality : In MapReduce although the data resides on different clusters, it
appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
Error handling : MapReduce engine provides different fault tolerance
mechanisms in case of failure. When the tasks are running on different cluster
nodes during which if any failure occurs then MapReduce engine find out those
incomplete tasks and reschedule them for execution on different nodes.
Scheduling : The MapReduce involves map and reduce operations that divide
large problems in to smaller chunks and those are run in parallel by different
machines.so there is a need to schedule different tasks on computational nodes on
priority basis which is taken care by MapReduce engine.
Q.8 Enlist the features of Virtual Box.
Ans. : The VirtualBox provides the following main features
It supports Fully Para virtualized environment along with Hardware
virtualization.
It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
It provides shared folder support to copy data from host OS to guest OS and vice
versa.
It has latest Virtual USB controller support.
It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.
Q.9 Describe google app engine.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 43 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 44 Cloud Technologies and Advancements
distribute workload around the globe and move data between desperate networks and
implement innovative security models for user access to cloud resources. In federated
clouds, the cloud resources are provisioned through network gateways that connect
public or external clouds with private or internal clouds owned by a single entity
and/or community clouds owned by several co-operating entities.
Q.15 Mention the importance of Transport Level Security (TLS). AU : Dec.-16
Ans. : The Transport Layer Securities (TLS) are designed to provide security at the
transport layer. TLS was derived from a security protocol called Secure Service Layer
(SSL). TLS ensures that no third party may eavdrops or tamper with any message.
The benefits of TLS are :
a. Encryption : TLS/SSL can help to secure transmitted data using encryption.
b. Interoperability : TLS/SSL works with most web browsers, including Microsoft
Internet Explorer and on most operating systems and web servers.
c. Algorithm Flexibility : TLS/SSL provides operations for authentication
mechanism, encryption algorithms and hashing algorithm that are used during
the secure session.
d. Ease of Deployment : Many applications TLS/SSL temporarily on a windows
server 2003 operating systems.
e. Ease of Use : Because we implement TLS/SSL beneath the application layer, most
of its operations are completely invisible to client.
Q.16 Enlist the features of extensible messaging & presence protocol for cloud
computing.
Ans. : The features of extensible messaging & presence protocol for cloud computing
are :
a. It is decentralized and supports easy two-way communication.
b. It doesn’t require polling for synchronization.
c. It has built-in publish subscribe (pub-sub) functionality.
d. It works on XML based open standards.
e. It is perfect for instant messaging features and custom cloud services.
f. It is efficient and scales up to millions of concurrent users on a single service.
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL).
i. It is flexible and extensible.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 46 Cloud Technologies and Advancements
Q.3 Explain the Hadoop Distributed File System architecture with a diagram.
AU : Dec.-18
Ans. : Refer section 5.2 and 5.2.1.
Q.4 Elaborate HDFS concepts with suitable diagram. AU : May-17
Ans. : Refer section 5.2 and 5.2.1.
OR Illustrate the design of Hadoop file system. AU : Dec.-19
Ans. : Refer section 5.2 and 5.2.1.
Q.5 Illustrate dataflow in HDFS during file read/write operation with suitable
diagrams. AU : Dec.-17
Ans. : The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves. The HDFS
is implemented as block structure file system where files are broken in to block of fixed
size stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 47 Cloud Technologies and Advancements
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system,
while the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block
creation, deletion and replication upon instruction from name node. The data node
stores each HDFS data block in separate file and several blocks are stored on different
data nodes. The requirement of such a block structured file system is to store, manage
and access files metadata reliably.
The representation of name node and data node is shown in Fig. 5.5.
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user
references files and directories by paths in the namespace. The user application does
not need to aware that file system metadata and storage are on different servers, or that
blocks have multiple replicas. When an application reads a file, the HDFS client first
asks the name node for the list of data nodes that host replicas of the blocks of the file.
The client contacts a data node directly and requests the transfer of the desired block.
When a client writes, it first asks the name node to choose data nodes to host replicas of
the first block of the file. The client organizes a pipeline from node-to-node and sends
the data. When the first block is filled, the client requests new data nodes to be chosen
to host replicas of the next block. The Choice of data nodes for each block is likely to be
different.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 48 Cloud Technologies and Advancements
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system
are divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
A. Read Operation in HDFS
The Read Operation in HDFS is shown in Fig. 5.6 and explained as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 49 Cloud Technologies and Advancements
6. Once the end of a block is reached, DFSInputStream closes the connection and
moves on to locate the next Data Node for the next block
7. Once a client has done with the reading, it calls a close () method.
The Write Operation in HDFS is shown in Fig. 5.7 and explained as follows
1. A client initiates write operation by calling 'create ()' method of Distributed File
system object which creates a new file - Step no. 1 in the above diagram.
2. Distributed file system object connects to the Name Node using RPC call and
initiates new file creation. However, this file creates operation does not associate
any blocks with the file. It is the responsibility of Name Node to verify that the file
(which is being created) does not exist already and a client has correct permissions
to create a new file. If a file already exists or client does not have sufficient
permission to create a new file, then IOException is thrown to the client.
Otherwise, the operation succeeds and a new record for the file is created by the
Name Node.
3. Once a new record in Name Node is created, an object of type
FSDataOutputStream is returned to the client. A client uses it to write data into
the HDFS. Data write method is invoked (step 3 in the diagram).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 50 Cloud Technologies and Advancements
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 51 Cloud Technologies and Advancements
Ans. : The MapReduce takes a set of input <key, value> pairs and produces a set of
output <key, value> pairs by supplying data through map and reduce functions. Every
MapReduce program undergoes different phases of execution. Each phase has its own
significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.8 and explained as follows.
Let us take an example of Word count application where inputs is an set of words.
The Input to the mapper has three sets of words like [Deer, Bear, River], [Car, Car,
River] and [Deer, Car, Bear]. These three sets are taken arbitrarily as an input to the
MapReduce process. The various stages in MapReduce for wordcount application are
shown in Fig. 5.9.
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 52 Cloud Technologies and Advancements
Once input file is selected then the splitting phase reads the input data and divided
those in to smaller chunks. Like [Deer, Bear, River], [Car, Car, River] and [Deer, Car,
Bear] as a separate set. The splitted chunks are then given to the mapper.
The mapper does map operations extract the relevant data and generate
intermediate key value pairs. It reads input data from split using record reader and
generates intermediate results like [Deer:1; Bear:1; River:1], [Car:1; Car:1; River:1] and
[Deer:1; Car:1; Bear:1]. It is used to transform the input key, value list data to output
key, value list which is then pass to combiner.
The shuffle and sort are the components of reducer. The shuffling is a process of
partitioning and moving a mapped output to the reducer where intermediate keys are
assigned to the reducer. Each partition is called subset. Each subset becomes input to
the reducer.in general shuffle phase ensures that the partitioned splits reached at
appropriate reducers where reducer uses http protocol to retrieve their own partition
from mapper. The output of this stage would be [Deer:1, Deer:1], [Bear:1, Bear:1],
[River:1, River:1] and [Car:1, Car:1, Car:1].
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases
occur simultaneously where mapped output are being fetched and merged. It sorts all
intermediate results alphabetically like [Bear:1, Bear:1], [Car:1, Car:1, Car:1], [Deer:1,
Deer:1] and [River:1, River:1]. The combiner is used with both mapper and reducer to
reduce the volume of data transfer.it is also known as semi reducer which accepts input
from mapper and passes output key, value pair to reducer. Then output of this stage
would be [Bear:2], [Car:3], [Deer:2] and [River:2].
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 53 Cloud Technologies and Advancements
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format
like [Bear:2, Car:3, Deer:2, River:2]. The final output of each MapReduce program is
generated with key value pairs written in output file which is written back to the HDFS
store. In example of Word count process using MapReduce with all phases of execution
are illustrated in Fig. 5.9.
Q.8 Explain the functional architecture of the Google cloud platform for app
engine in detail.
Ans. : Refer section 5.5.
Q.11 Explain the significance of Big table along with its working.
Ans. : Refer section 5.6.2.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing Lab
Contents
Lab 1 : Install Virtual Box and KVM with different flavors of Linux or windows on the top of host
OS......................................................................... ....................................................L - 2
Lab 2 : Install C compiler in Virtual machine created using VirtualBox and execute simple
programs............................................................... ..................................................L - 10
Lab 3 : Install GoogleApp Engine, create helloworld app, and other simple web applications
using Python.. ....................................................... ..................................................L - 12
Lab 5 : To Simulate Cloud scenario using CloudSim and run a scheduling algorithm into it.
.............................................................................. ..................................................L - 16
Lab 6 : Find a Procedure to transfer files from one VM to another VM in VirtualBox.. ........L - 21
Lab 7 : To demonstrate installation and Configuration of Open stack Private cloud... ........L - 22
Lab 8 : Install Hadoop Single node cluster and simple application like Wordcount. ............L - 40
Lab 9 : Explore Storage as a service using own Cloud for remote file access using web
interfaces... ........................................................... ...................................................L- 64
Lab 10 :To Create and access Windows Virtual machine using AWS EC2... ......................L - 69
Lab 11 :To host a word press website using Light sail service in AWS ................................L - 79
(L - 1)
Cloud Computing L-2 Cloud Computing - Lab
Lab 1 : Install Virtual Box and KVM with different flavors of Linux or windows on the
top of host OS
Step 2 : Install it in Windows, once the installation has done open it.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-3 Cloud Computing - Lab
Step 4 : Specify RAM Size, HDD Size, and Network Configuration and Finish the
wizard
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-4 Cloud Computing - Lab
Step 5 : To select the media for installation Click on start and browse for iso file
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-5 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-6 Cloud Computing - Lab
Step 1 : Click on new virtual machine and select Operating system Type as Windows
and version as Windows 10 along with the name Windows Insider Preview.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-7 Cloud Computing - Lab
Step 2 : Perform step 2 and 3 same as installation of Ubuntu. In step 4 select ISO file of
Windows 10 instead of Ubuntu and complete the installation. Once installation is done
windows will be available as shown below.
The Steps to Create and run Virtual machines in KVM are as follows
KVM only works if your CPU has hardware virtualization support - either Intel VT-x or
AMD-V. To determine whether your CPU includes these features, run the following
command :
#sudo grep -c "svm\|vmx" /proc/cpuinfo
A 0 indicates that your CPU doesn’t support hardware virtualization, while a 1 or more
indicates that it does.
Virt-Manager is a graphical application for managing your virtual machines.you can use
the kvm command directly, but libvirt and Virt-Manager simplify the process.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-8 Cloud Computing - Lab
3) Create User
Only the root user and users in the libvirtd group have permission to use KVM virtual
machines. Run the following command to add your user account to the libvirtd group :
#sudo adduser tsec
#sudo adduser tsec libvirtd
After running this command, log out and log back in as tsec
Run following command after logging back in as tsec and you should see an empty list of
virtual machines.
This indicates that everything is working correctly.
#virsh -c qemu:///system list
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L-9 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 10 Cloud Computing - Lab
Lab 2 : Install C compiler in Virtual machine created using VirtualBox and execute
simple programs
In Lab 1, we have already created an Ubuntu Linux virtual machine. Now let us see how
to install ‘C’ compiler inside that virtual machine and execute programs. The package
GCC needs to be installing to use C compiler.
The GNU Compiler Collection (GCC) is a collection of compilers and libraries for C, C++,
Objective-C, FORTRAN, Ada, Go, and D programming languages. Many open-source
projects, including the GNU tools and the Linux kernel, are compiled with GCC. The
steps for installing GCC and running C programs are as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 11 Cloud Computing - Lab
The output will show the appropriate GCC version like gcc
(Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
#include<stdio.h>
main()
{
printf("HelloWorld\n");
return 0;
}
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 12 Cloud Computing - Lab
Lab 3 : Install GoogleApp Engine, create helloworld app, and other simple web
applications using Python.
A) First install the Cloud SDK and then set up a Cloud project for App Engine :
Note : If you already have the Cloud SDK installed, update it by running the following
command :
gcloud components update
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 13 Cloud Computing - Lab
4. Initialize your App Engine app with your project and choose its region :
gcloud app create --project=[YOUR_PROJECT_ID]
When prompted, select the region where you want your App Engine application located.
5. Make sure billing is enabled for your project. A billing account needs to be linked to
your project in order for the application to be deployed to App Engine.
It is recommended that you have the latest version of Python, pip, and other related tools
installed on your system. For instructions, refer to the Python Development Environment
Setup Guide. This quick start demonstrates a simple Python app written with the Flask
web framework that can be deployed to App Engine. Although this sample uses Flask,
you can use any web framework that satisfies the requirements above. Alternative
frameworks include Django, Pyramid, Bottle, and web.py.
In last section, we’ve created a simple Hello World app for Python 3.7. Now next section
explains how to deploying an app to the Google Cloud.
1. Clone the Hello World sample app repository to your local machine.
git clone https://github.com/GoogleCloudPlatform/python-docs-samples
Alternatively, you can download the sample as a zip file and extract it.
To run the Hello World app on your local computer : Mac OS / LinuxWindows, Use
PowerShell to run your Python packages.
a) Locate your installation of PowerShell.
b) Right-click on the shortcut to PowerShell and start it as an administrator.
c) Create an isolated Python environment in a directory external to your project and
activate it :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 14 Cloud Computing - Lab
1. Deploy the Hello World app by running the following command from the
standard_python37/hello_world directory:
gcloud app deploy
Learn about the optional flags.
The steps to launch web application in Google app engine are as follows
To use Google's tools for your own site or app, you need to create a new project on
Google Cloud Platform. This requires having a Google account.
1. Go to the App Engine dashboard on the Google Cloud Platform Console and press
the Create button.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 15 Cloud Computing - Lab
2. If you've not created a project before, you'll need to select whether you want to
receive email updates or not, agree to the Terms of Service, and then you should be
able to continue.
3. Enter a name for the project, edit your project ID and note it down. For this tutorial,
the following values are used :
Project Name : GAE Sample Site
Project ID : gaesamplesite
4. Click the Create button to create your project.
Each Cloud Platform project can contain one App Engine application. Let's prepare an
app for our project.
1. We'll need a sample application to publish. If you've not got one to use, download
and unzip this sample app.
2. Have a look at the sample application's structure - the website folder contains your
website content and app.yaml is your application configuration file.
3. Your website content must go inside the website folder, and its landing page must
be called index.html, but apart from that it can take whatever form you like.
4. The app.yaml file is a configuration file that tells App Engine how to map URLs to
your static files. You don't need to edit it.
Now that we've got our project made and sample app files collected together, let's publish
our app.
1. Open Google Cloud Shell.
2. Drag and drop the sample-app folder into the left pane of the code editor.
3. Run the following in the command line to select your project:
gcloud config set project gaesamplesite
5. You are now ready to deploy your application, i.e. upload your app to App Engine:
gcloud app deploy
6. Enter a number to choose the region where you want your application located.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 16 Cloud Computing - Lab
7. Enter Y to confirm.
Source : https://developer.mozilla.org/en-
US/docs/Learn/Common_questions/How_do_you_host _ your_
website_on_Google_App_Engine
Lab 5 : To Simulate Cloud scenario using CloudSim and run a scheduling algorithm
into it.
1. Open up Eclipse and go to Menu Section, then click File, keep on clicking New and
finally select java project. It is shown as in the Fig. 1.
Fig. 1
Open eclipse and select java project.
Open up Eclipse and Click on java project.
2. A new window will get open. Put a foot on to the following steps :
Fig.2 Give project Name and select run time environment and Finish
3. Once you hit finish. An empty project named CloudIntro will be created in the project
List as shown in the Fig. 3.
4. Next step is to go the project CloudIntro, right click on it. Click Import as shown in the
Fig. 4.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 18 Cloud Computing - Lab
5. A new window will get open, now click File System as demonstrated in the Fig. 5.
6. Next Step is to go to the directory where you have extracted your cloud sim tool. Fig. 6
is shown to guide you to get into the directory where your cloudsim folder is located.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 19 Cloud Computing - Lab
http://commons.apache.org/proper/commonsmath/download _math.cgi.
Download the file named as “commons-math3-3.4.1-bin.zip”. Unzip this file. We need jar
files for math functions.
9. Now go to the left side of the eclipse tool in the project bar. Go to jar and right click on
it. Click import as shown in the Fig. 8.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 20 Cloud Computing - Lab
10. Now go to the folder where you have placed the downloaded and extracted file as
described by point 8. Then all you have to do is select that jar file and hit finish as
shown by the Fig. 9.
11. Finally the cloud sim is installed into your Eclipse environment.
Now to write the following program for VM Scheduling and run inside CloudSim
Programs are available on
a) http://www.cloudbus.org/cloudsim/examples.html
b) https://www.cloudsimtutorials.online/how-to-do-virtual-machine-and-task-
scheduling-in-cloudsim/
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 21 Cloud Computing - Lab
A shared folder is a folder which makes its files available on both the guest machine and
the host machine at the same time. Creating a shared folder between the guest and the
host allows you to easily manage files which should be present on both machines. The
course virtual machines are ready to use shared folders right away, but if you are using
the virtual machine on your personal computer you will need to specify which folder to
use as shared storage.
If you are using a course VM on a lab computer, it is likely that a shared folder has
already been setup for you. On the desktop of your course VM you should notice a folder
titled Shared Folders. Inside of this you will find any folders that have been shared
between the course VM and lab computers. You should see two folders that have already
been configured for you: Z_DRIVE and Temp. Z_DRIVE gives you access to your
Windows Account Z:\ drive. This is storage that is persistent to your SCS account and
available as a network drive on the lab computers.
Temp gives you access to the folder found at D:\temp on the lab computer. Files stored in
this folder are local to the machine, meaning that they can be accessed faster, but will
delete from the system when you log out.
If you are working with data that you will need to use again, use the Z_DRIVE for your
shared folder. If you need faster read/write speed, use the Temp folder, but remember to
back up your files or they will be deleted when you log off the computer.
If you are using your own personal machine, you will need to configure VirtualBox to
look in the right place for your shared files.
First, click on the guest machine you intend to share files with. From there, you can select
the guest Settings and navigate to Shared Folders on the left side menu. To create a new
shared folder, either clicks the New Folder icon on the right menu or right clicks the
empty list of shared folders and click Add Shared Folder. From here, there are six options.
Folder Path : The folder name on the host machine. Click the drop down menu and
navigate to the folder you would like to share.
Folder Name : This is the name of the folder as it will appear on the guest machine.
Read-Only : If you check read-only, the guest machine will be unable to write
changes to the folder. This is valuable when you only want to send files to the
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 22 Cloud Computing - Lab
virtual machine, but do not want to risk having the files modified by the guest.
Auto-Mount : When any external storage is connected to a computer it must be
mounted in order to be used. It is recommended that you turn on auto-mounting,
unless you are familiar with the process of mounting a drive yourself.
Mount Point : Unless you already know about mount points, leave this blank.
Make Permanent : If you check this, the shared folder will be a permanent machine
folder. If it is not checked, the folder will not be shared after a shutdown.
On the course virtual machines, when you load into the desktop, you should see a
folder labelled SharedFolders. In there you will see any folders that are currently
mounted and being shared.
If you only need to transfer a few files quickly, you can simply drag and drop the files in.
On the top bar of the running guest machine, click on Devices > Drag and Drop and make
sure that Bidirectional is selected. This means that you will be able to drag files from the
host to the guest and from the guest to the host. Once bidirectional drag and drop is
checked, you should be able to begin dragging and dropping files.
You can also drag files from the guest machine into the host. To do this, simply open the
file browser on the host to where you would like to drop the files and drag the files from
the virtual machine into the file browser of the host. File transfers should be pretty quick;
if the virtual machine seems stuck when transferring, simply cancel the transfer and try
again.
Source :
https://carleton.ca/scs/tech-support/virtual-machines/transferring-files-to-and-from-
virtual-machines/
The OpenStack installation can be done using many ways like RDO Pack stack, Mirantis
or Devstack who have series of shell scripts which carries automated installation of
OpenStack. The DevStack is a series of extensible scripts used to quickly bring up a
complete OpenStack environment based on the latest versions of everything from git
master.
To install OpenStack using Devstack the Prerequisites are Intel or AMD Multicore CPU,
Minimum 6-8GB RAM, 250 GB Hard disk and preinstalled Ubuntu server/Desktop
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 23 Cloud Computing - Lab
Operating system version 16.04 or above and internet speed should be minimum 4 MBPS.
(The installation steps can be found at https://docs.openstack.org/devstack/latest/ )
The steps for installing Openstack using Devstack in a single server (All in one Single
machine setup) are given as follows.
The current version of Ubuntu OpenStack is Newton. So, that’s what we are going to
install. To begin with the installation, first, we need to use the git command to clone
devstack.
$sudo apt-get update
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 24 Cloud Computing - Lab
Step 3 : Open Devstack directory and start installation by executing stack.sh shell script
$cd Devstack
$./stack.sh
At the initial stage, the installer will ask passwords for database, rabbit, service
authentication, horizon and keystone.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 25 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 26 Cloud Computing - Lab
The installer may take up to 30 minutes to complete the installation depends on the
internet bandwidth. Once installation is done you may see the following screen which
displays ip address of dashboard i.e. horizon through which you can gain access to open
stack VMs and resources.
As you can see, two users have been created for you; admin and demo. Your password is
the password you set earlier. These are the usernames you will use to login to the
OpenStack Horizon Dashboard.
Open up a browser, and put the Horizon Dashboard address in your address bar.
http://192.168.0.116/ dashboard you should see a login page like this.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 27 Cloud Computing - Lab
To start with, log in with the admin users credentials. In admin panel, you will need to
use the demo user, or create a new user, to create and deploy instances. As you can see,
two users have been created for you; admin and demo. Your password is the password
you set earlier. These are the usernames you will use to login to the OpenStack Horizon
Dashboard. Take note of the Horizon web address listed in your terminal.
To launch an instance from OpenStack dashboard, first we need to finish following steps :
Create a Project and add a member to the Project
Create Image and Flavor
Create Network for the Project
Create Router for the Project
Create a Key pair
Login to the dashboard using Admin credentials and Go to Identity Tab –> Projects and
Click on Create Project.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 28 Cloud Computing - Lab
Click on “Create Project” , We can also set the Quota for the project from Quota Tab. To
create Users, Go to Identify Tab–> Users–> Click on ‘Create User’ Button then specify
User Name, email, password, Primary Project and Role and click on create user to add in
to OpenStack workspace.
To create a flavor login in dashboard using admin credentials, Go to Admin Tab –>
Flavors –> Click on create Flavor.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 29 Cloud Computing - Lab
Specify the Flavor Name (fedora.small), VCPU , Root Disk , Ephemeral Disk & Swap disk.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 30 Cloud Computing - Lab
To create Network and router for Innovation project sign out of admin user and login as
local user in dashboard.
For my convenience i have setup my network as above
Internal Network = 10.10.10.0/24
External Network or Floating IP Network = 192.168.1.0/24
Gateway of External Network = 192.168.1.1
Now, Go to the Network Tab —> Click on Networks —> then Click on Create Network
Specify the Network Name as Internal
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 31 Cloud Computing - Lab
Click on Next. Then Specify the Subnet name (sub-internal) and Network Address
(10.10.0.0/24)
Click on Next. Now, VMs will be getting internal IP from DHCP Server because we
enable DHCP option for internal network.
Now Create External Network. Click on “Create Network” again, Specify Network
Name as “external”
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 32 Cloud Computing - Lab
Click on Next
Untick “Enable DHCP” option and Specify the IP address pool for external network.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 33 Cloud Computing - Lab
Click on Create.
Now time to create a Router. To create router Go To Network Tab –> Routers –> Click
on ‘+ Create Router’
Now Mark External network as “External” , this task can be completed only from admin
user , so logout from linuxtechi user and login as admin.
Go to Admin Tab —> Networks–> Click on Edit Network for “External”
Click on Save Changes. Now Logout from admin user and login as local user. Go to
Network Tab —> Routers –> for Router1 click on “Set Gateway”
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 34 Cloud Computing - Lab
Click on “Set Gateway”, this will add an interface on router and will assign the first IP of
external subnet (192.168.1.0/24).
Add internal interface to router as well, Click on the “router1″ and select on “interfaces”
and then click on “Add interface”.
Now, Network Part is completed now & we can view Network Topology from “Network
Topology” Tab as below.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 35 Cloud Computing - Lab
Now Create a key pair that will be used for accessing the VM and define the Security
firewall rules.
Go to ‘Access & Security’ Tab -> Click on Key Pairs -> then click on ‘Create Key Pair‘
It will create a Key pair with name “myssh-keys.pem” Add a new Security Group with
name ‘fedora-rules’ from Access & Security Tab. Allow 22 and ICMP from Internet
( 0.0.0.0 ).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 36 Cloud Computing - Lab
Once the Security Group ‘fedora-rules’ created , click on Manage Rules and allow 22 &
ICMP ping.
F) Launch Instance
Now finally it’s time to launch an instance. To launch instance, Go to Compute Tab –>
Click on Instances –> then click on ‘Launch Instance’ Then Specify the Instance Name,
Flavor that we created in above steps and ‘Boot from image’ from Instance Boot Source
option and Select Image Name ‘fedora-image’.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 37 Cloud Computing - Lab
Click on ‘Access & Security’ and Select the Security Group ‘fedora-rules’ & Key Pair
”myssh-keys”
Now Select Networking and add ‘Internal’ Network and the Click on Launch.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 38 Cloud Computing - Lab
Once the VM is launched, Associate a floating ip so that we can access the VM.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 39 Cloud Computing - Lab
Click on Associate
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 40 Cloud Computing - Lab
As we can see above that we are able to access the VM using keys. Our task of launching a
VM from Dashboard is completed now.
Lab 8 - Install Hadoop Single node cluster and simple application like Wordcount.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 41 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 42 Cloud Computing - Lab
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus our local
machine. For our single-node setup of Hadoop, we therefore need to configure SSH
access to localhost. So, we need to have SSH up and running on our machine and
configured it to allow SSH public key authentication.
$ ssh-keygen -t rsa -P ""
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 43 Cloud Computing - Lab
Add the newly created key to the list of authorized keys so that Hadoop can use ssh
without prompting for a password.
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 44 Cloud Computing - Lab
Disable ipv6 feature as it uses 0.0.0.0 for the various networking-related Hadoop
configuration options will result in Hadoop binding to the IPv6 addresses. To disable it
open sysctl.conf file
$ sudo nano /etc/sysctl.confa
add following lines at the end of sysctl.conf file and reboot the machine.
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
You can check whether IPv6 is enabled on your machine with the following command :
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 45 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 46 Cloud Computing - Lab
The following files will have to be modified to complete the Hadoop setup :
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
1. Configure bashrc file
we need to find the path where Java has been installed to set the JAVA_HOME
environment variable in bashrc file. So open bashrc file.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 47 Cloud Computing - Lab
$ source ~/.bashrc
$ javac -version
$ which javac
$ readlink -f /usr/bin/javac
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 48 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 49 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 50 Cloud Computing - Lab
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-
site.xml
$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 51 Cloud Computing - Lab
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 52 Cloud Computing - Lab
5. Configure hdfs-site.xml
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 53 Cloud Computing - Lab
The Hadoop file system needs to be formatted so that we can start to use it.
$ hadoop namenode –format
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 54 Cloud Computing - Lab
There are two commands to start all the services of hadoop.They are given as follows.
$start-all-sh
$./start-all.sh
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 55 Cloud Computing - Lab
To verify all the services are running pass JPS command. If output of JPS command
shows following output then we can say that hadoop is successfully installed.
8.2 Write a word count program to demonstrate the use of Map and Reduce tasks.
In this practical, single node hadoop cluster have been used. The hadoop cluster with pre-
installed eclipse on Cent OS is going to used for running Map-reduce program. The steps
to run word count program using map-reduce framework are as follows.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 56 Cloud Computing - Lab
Step 1 - Open Eclipse and create new Java project specify name and click on finish.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 57 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 58 Cloud Computing - Lab
Step 3 - Right click on Package name wordcount and create new class in it and assign
name wordcount
package wordcount;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 59 Cloud Computing - Lab
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
while(token.hasMoreTokens())
{
String status = new String();
String word = token.nextToken();
Text outputKey = new Text(word);
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
} // end of map()
} //end of Mapper Class
int sum = 0;
} // end of reduce()
} // end of Reducer class
/*
*/
// job definition
} // end of main()
To add jar files right click on class file then select build path option then open configure
build path window. To add essential libraries click on add external jars butoon and add
three jar files one by one .Here we need three jar files namely hadoop-core.jar,common-
cli-1.2.jar and core-3.1.1.jar
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 61 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 62 Cloud Computing - Lab
Step 6 - Once all the errors have been resolved then right click on project and select
export jar files,specify name to it and click on finish.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 63 Cloud Computing - Lab
Step 7- Create input text file and copy both input and jar files it to hadoop directory
To see the output open part file which lies inside output002 directory.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 64 Cloud Computing - Lab
Lab 9 : Explore Storage as a service using own Cloud for remote file access using
web interfaces.
ownCloud is a suite of client-server software for creating and using file hosting services.
ownCloud is functionally very similar to the widely used Dropbox, with the primary
functional difference being that the Server Edition of ownCloud is free and open-source,
and thereby allowing anyone to install and operate it without charge on a private server.
It also supports extensions that allow it to work like Google Drive, with online document
editing, calendar and contact synchronization, and more. Its openness avoids enforced
quotas on storage space or the number of connected clients, instead having hard limits
(like on storage space or number of users) defined only by the physical capabilities of the
server.
Own cloud can be installed over the any flavor of linux like Ubuntu, Centos, Fedora etc.
but Ubuntu is preferable. The Steps for installation are as follows
The ownCloud server package does not exist within the default repositories for Ubuntu.
However, ownCloud maintains a dedicated repository for the distribution that we can
add to our server.
To begin, download their release key using the curl command and import it with the apt-
key utility with the add command :
$curl https://download.owncloud.org/download/repositories/10.0/Ubuntu_18.04/Release.key | sudo
apt-key add –
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 65 Cloud Computing - Lab
The 'Release.key' file contains a PGP (Pretty Good Privacy) public key which apt will use
to verify that the ownCloud package is authentic.
Now execute following commands on the terminal
1) $ echo 'deb http://download.owncloud.org/download/repositories/10.0/Ubuntu_18.04/ /' | sudo
tee /etc/apt/sources.list.d/owncloud.list
2) $sudo apt update
3) $ sudo apt install php-bz2 php-curl php-gd php-imagick php-intl php-mbstring php-xml php-zip
owncloud-files
The ownCloud package we installed copies the web files to /var/www/owncloud on the
server. Currently, the Apache virtual host configuration is set up to serve files out of a
different directory. We need to change the DocumentRoot setting in our configuration to
point to the new directory.
$sudo apache2ctl -t -D DUMP_VHOSTS | grep server_domain_or_IP
Now edit the Configuration file and add following lines so that it points to the
/var/www/owncloud directory:
$sudo nano /etc/apache2/sites-enabled/server_ domain_or_IP.conf <VirtualHost *:80>
...
DocumentRoot /var/www/owncloud
...
</VirtualHost>
When you are finished, check the syntax of your Apache files to make sure there were no
detectable types in your configuration
$sudo apache2ctl configtest
Output - Syntax OK
To access the ownCloud web interface, open a web browser and navigate to the servers IP
address as shown below.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 66 Cloud Computing - Lab
Own Cloud portal has two types of users like Admin user and local user. The admin user
can create users/groups, assigns storage quota, assigns privileges and can manage users
and group activities.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 67 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L - 68 Cloud Computing - Lab
The local user is an restricted user who can perform local activities like upload or share
files, delete local shares or can create share etc.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 69 Cloud Computing - Lab
The alternate way to use own cloud is to download the readymade virtual machine from
website https://bitnami.com/stack/owncloud/cloud which can be run directly on
virtualization platform like virtual box or VMware workstation.
Lab 10 - To Create and access Windows Virtual machine using AWS EC2.
[Note : Above three labs are performed on AWS free tier account which are almost free
for everyone. So please create AWS Free tier account from
https://aws.amazon.com/free/ ].
The Steps to create and access Windows Virtual machine using AWS EC2 are as follows
Step 1 - Login to AWS portal and Select EC2 service from admin console.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 70 Cloud Computing - Lab
Step 2 - The EC2 resource page will appear which will show you the summary of
instances. Now click on launch instance to select the VM instance type.
Step 3 - Select the operating system type in AMI format. In this example we have
selected Windows server instance which is eligible for free tier and click on Next.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 71 Cloud Computing - Lab
Step 4 - Now select the hardware type for Virtual machine. In this example we have
selected free tier eligible General purpose hardware and click on Next.
Step 5 - Now specify the instance details like Number of instances, networking options
like VPC, Subnet ot dhcp public IP etc. and click on Next
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 72 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 73 Cloud Computing - Lab
Step 10 - Now to secure VM instance, Encrypt it using public key and create a private key
pair to decrypt that. Here specify key pair name and download key pair.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 74 Cloud Computing - Lab
Step 12 - Now from summary page click on View instance to see the instance state. After
some time you will see the running instance of your VM.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 75 Cloud Computing - Lab
Step 13 - Now Click on Connect to get the password for VM to access it over RDP
protocol.
Step 14 - Select the downloaded key pair files to decrypt the password.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 76 Cloud Computing - Lab
Step 15 - Now connect the instance using RDP tool by using Ipaddress/DNS, username
and Password decrypted in last step.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 77 Cloud Computing - Lab
Step 16 - Once you click on connect, you will see the running Windows virtual machine
as shown below.
Step 17 - You can shut down instance by selecting instance state followed by stop.
Step 18 - You can delete the instance permanently by selecting instance state followed
by stop.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 78 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 79 Cloud Computing - Lab
Lab 11 : To host a word press website using Light sail service in AWS.
Step 1 - Open Admin console of AWS and select light sail service.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 80 Cloud Computing - Lab
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 81 Cloud Computing - Lab
Step 7 - Click on connect instance to get the password for word press.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 82 Cloud Computing - Lab
Step 9 - Now researve static ip by selecting network option and creating static IP.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 83 Cloud Computing - Lab
Once static IP is allocated then open that ip on browser to see Word press Website.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 84 Cloud Computing - Lab
Open admin console of Word press and use password obtained in step 8 to open word
press site builder.
Now you can develop a complete Word press website and use that.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 85 Cloud Computing - Lab
Step 2 - Now, click on create bucket to create a storage medium to store user’s data.
Specify name to the bucket along with the region where you want to create a bucket.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 86 Cloud Computing - Lab
In next screen, select the versioning anf tag options if required. Otherwise click on next.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 87 Cloud Computing - Lab
In next screen, set the public acces settings for the bucket and associated files as per
requirements.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 88 Cloud Computing - Lab
Now, click on add files to add files from local computer followed by clicking upload
button.
During upload, set the user access permissions and storage classes if required.Upon
successful, the file uploaded in S3 bucket is shown below.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 89 Cloud Computing - Lab
By opening a file, you can view the different file attributes and object URL through which
users can download that file by making it public.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing L – 90 Cloud Computing - Lab
Notes
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing S-1 Model Question Paper
Instructions :
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
Part – A (10 × 2 = 20 Marks)
Q.1 Highlight the importance of Cloud Computing.
Q.6 What are the core components of Google app engine architecture ?
OR
b) Explain in detail web services protocol stack and publish-subscribe models with respect to
web services.
Q.12 a) What is virtualization ? Describe para and full virtualization architectures, compare and
contrast them.
OR
OR
b) Explain the baseline Identity and access management (IAM) factors to be practices by the
stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment.
Part – C (15 × 1 = 15 Marks)
Q.16 a) Write detailed note on Resource Provisioning along with different Resource Provisioning
Methods.
OR
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge