CS8791 - Cloud Computing

SUBJECT CODE : CS8791
Strictly as per Revised Syllabus of

Anna University
Choice Based Credit System (CBCS)
Semester - VII (CSE / IT)
Cloud computing
Dr. Bhushan Jadhav
Ph.D. Computer Engineering
Assistant Professor, Information Technology Department,
Thadomal Shahani Engineering College,
Bandra, Mumbai.
Sonali Jadhav
M.E. Computer Engineering
Assistant Professor, Computer Engineering Department,
D. J. Sanghvi College of Engineering,
Mumbai.
® ®
TECHNICAL
PUBLICATIONS
SINCE 1993 An Up-Thrust for Knowledge
(i)
Cloud Computing
Subject Code : CS8791
Semester - VII (Computer Science and Engineering and Information Technology)
First Edition : September 2020
ã Copyright with Authors

All publishing rights (printed and ebook version) reserved with Technical Publications. No part of this book
should be reproduced in any form, Electronic, Mechanical, Photocopy or any information storage and
retrieval system without prior permission in writing, from Technical Publications, Pune.
Published by :
® ®
TECHNICAL Amit Residency, Office No.1, 412, Shaniwar Peth,

Pune - 411030, M.S. INDIA, Ph.: +91-020-24495496/97
PUBLICATIONS
SINCE 1993 An Up-Thrust for Knowledge Email : sales@technicalpublications.org Website : www.technicalpublications.org
Printer :
Yogiraj Printers & Binders
Sr.No. 10\1A,
Ghule Industrial Estate, Nanded Village Road,
Tal-Haveli, Dist-Pune - 411041.
ISBN 978-93-90041-22-0
9 789390 041220 AU 17
9789390041220 [1] (ii)

1 Introduction
Syllabus
Introduction to Cloud Computing, Definition of Cloud Computing, Evolution of Cloud Computing,
Underlying Principles of Parallel and Distributed Computing, Cloud Characteristics, Elasticity in
Cloud, On-demand Provisioning, Challenges in Cloud Computing
Contents
1.1 Introduction to Cloud Computing
1.2 Definition of Cloud Computing
1.3 Evolution of Cloud Computing
1.4 Underlying Principles of Parallel and Distributed Computing
1.5 Cloud Characteristics
1.6 Elasticity in Cloud
1.7 On-demand Provisioning
1.8 Challenges in Cloud Computing
(1 - 1)
Cloud Computing 1-2 Introduction
1.1 Introduction to Cloud Computing
The cloud computing becomes a must have technology in every IT organization

because of its prominent benefits over the existing computing technologies. Cloud
computing is highly correlated with other computing models like grid computing,
distributed computing and parallel computing which are coupled with virtualization. It is
intended to make a better use of distributed hardware and software resources which are
combined together to achieve higher throughput at lower cost and can able to solve large
scale computation problems in lesser time. Basically, Cloud computing has aggregation of
computing resources (like CPU, memories), networking solutions, storage management
and virtualization solutions which are available on demand, and delivered economically.
Today, the use of cloud computing is massive in size because it has capability to
deliver resources and services of any size at any time without worrying to setup anything
at faster rate with economical cost. The services provided by cloud follows a utility-based
model which incurs cost for used services only and it doesn’t require to setup or install
any software or hardware for the usage. It is a distributed computing model through
which user can gain access to shared pool of resources, applications and services from
anywhere at any time on any connected devices. Earlier, organizations were used to keep
various servers in their server rooms, where they had separate file servers to store the
documents, database servers to store the records in databases, transaction servers to run
the transactions, exchange servers to run the E-mail applications and web servers for web
hosting. The setup and upfront cost of every server was pricey as it has required very
expensive hardware, costly server operating systems and related application/database
software and so forth. A side from that it additionally needs additional manpower to
oversee it like network administrator to take a look at the network, System administrator
to deal with the hardware, Database administrators to deal with the databases, Web
administrator to deal with the web sites and so forth. Thus, capital expense related with
the earlier arrangements was excessively high and was not reasonable for small and
medium-size organizations. In this manner, to inquire about on customary arrangements
brought forth cloud computing.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Fig. 1.1.1 : Various aspects of cloud computing

Nowadays, setting up a separate server room is becoming history. In cloud computing,
the organizations neither have to spend much on capital nor do they have to buy or setup
any servers rather, they are kept at cloud providers remote data center and provided with
little subscription charges. The cloud services are provided through pay as you go model
which generates the bill based on per hour, per minute or per second usage. The
subscription model generates usage bill after a month, quarter or year. Both the models
allow users to pay only for what they have used. The cloud service provider takes care of
servers installed at their datacenters used by their clients and provides multiple services
under a single web portal like compute, storage, network, security, analytics etc. In short
we can say, cloud computing gives dynamic delivery of IT services over the web with
eminent features like rapid scalability, cost saving in infrastructure, minimizing licensing
requirement, on-demand self-service, disaster recovery, load balancing, the flexibility of
services, high performance, high availability, get access from anywhere, at any time, on
any device and so forth. The various aspects of cloud computing are their service models,
deployment models, pricing models, stakeholders, technology used along with features
and challenges are shown in Fig. 1.1.1.
®
1.1.1 Cloud Computing and Other Similar Configurations

The cloud computing is often compared with other computing architectures like peer-
to-peer, client-server, grid computing, distributed computing and cluster computing.
These configurations are explained as follows.
a) Peer to Peer Architecture

A peer-to-peer architecture has collection of hosts connected in a network intended for
resource sharing, task processing, and communication. Each host on the network act as a
server and has equal right in terms of providing and using resources where users are
authenticated by each individual workstation. In peer to peer architecture, there is no
controller or server who can control the access and communication between hosts that
makes performance bottleneck. The communication happens in peer to peer architecture
is completely decentralized. The generalized peer to peer architecture is shown in
Fig. 1.1.2. The limitations of peer to peer architecture are lack of scalability, poor
performance, low throughput, limited flexibility etc. These limitations are overcome by
cloud computing by making architecture fully centralized. It can provide automated
scalability, high performance, flexibility, and mobility.
Fig. 1.1.2 : Peer to peer architecture
®
The limitation of peer to peer architecture is it incurs additional capital cost for the
implementation, generates too many request/reply messages that makes congestion in
the network and difficult to manage the traffic flow.
b) Client Server Architecture

In Client-server architecture, there is at least one specialized server which controls the
communication between multiple clients. Typically, there is a server called controller who
provide access to network services like shared files, shared printer, hardware storage, and
application to the clients who are requester.
Fig. 1.1.3 : Client server Architecture
The server is responsible for handling the resource sharing, task processing and
communications between the clients. So, clients have to rely on server for various access
and services. It is faster than peer to peer architecture as server is responsible for granting
and denying permission for access. The generalized Client-server architecture is shown in
Figure 1.1.3.
The client server architecture has centralized processing which makes faster
communication with good performance. The cloud computing also follows Client server
architecture but massive in size that gives seamless delivery of services with flexibility,
scalability, mobility at lower cost.
c) Grid Computing
The Grid Computing architecture has geographically distributed computing resources
which work together to perform a common task. A typical grid has pool of loosely
coupled computers who worked together to solve a complex computational problem. It
has heterogeneous resources which are controlled by a common node called control node
like client server architecture.
®
Conceptually grid computing works similar to cloud computing for providing services
through a shared pool of resources. The grid computing follows a distributed architecture
while cloud computing follows centralized distributed architecture. In Grid computing
the compute resources are distributed across cities, countries, and continents. Therefore,
they are managed completely in a distributed manner while in cloud computing although
resources are distributed but they are managed centrally. The Cloud computing is
advantageous over Grid computing in terms of availability, scalability, flexibility, disaster
recovery and load balancing.
d) Distributed computing
It is a computing concept that refers to multiple computer systems working on a single
problem. In distributed computing, a single problem is divided into many parts, and each
part is executed by different computers. As long as the computers are networked, they
can communicate with each other to solve the problem. If it is done properly, the
computers perform like a single entity. The ultimate goal of distributed computing is to
maximize performance by connecting users and IT resources in a cost-effective,
transparent and reliable manner. This type of computing is highly scalable.
e) Cluster computing
Cluster computing is also intended to solve a complex computational problem in a
group of computers connected through a network. Generally, the cluster is a collection of
interconnected loosely coupled homogenous computers that work together closely, so
that in some respects they can be regarded as a single computer. Each cluster is composed
of multiple standalone machines connected by a network. The modern clusters are
typically designed to handle more difficult problems that require nodes to share
intermediate results with each other very often that require a high bandwidth and a low-
latency interconnection network.
The cluster computing has loosely coupled computers where the local operating
system of each computer manages its resources. Therefore, cluster server needs to merge
multiple system images into single system image to support sharing of CPUs, memories
and IOs across cluster nodes. The single system image (SSI) can be formed only with the
help of middleware that makes clusters appears like a single machine to the user. Without
middleware clusters cannot work efficiently to achieve cooperative computing.
®
The generalized architecture of cluster computing based on master-slave approach is

shown in Fig. 1.1.4. The clusters also support massive parallelism with the help of
compute nodes (like workstations, servers) with the help of communication softwares like
PVM or MPI. It is capable of running both sequential and parallel applications. The cloud
computing is advantageous over cluster computing in terms of resource management,
scalability, reliability and cost.
Fig. 1.1.4 : Architecture of cluster computing
1.1.2 Advantages of Cloud Computing

There are many advantages of cloud computing, some of them are explained as
follows.
 Improved accessibility : The cloud computing provides efficient access to services
and resources from anywhere, at any time, on any device.
 Optimum Resource Utilization : Servers, storage and network resources are better
utilized in the cloud environment as they are shared among multiple users, thus it
cut down the wastage of resources.
 Scalability and Speed : The cloud computing provides high scalability where
capacity of hardware, software or network resources can be easily increased or
decreased based on demand. In this, organizations do not have to invest money and
time behind buying and setting up the hardware, software and other resources
instead, they can easily scale up or scale down their resources or services running
on cloud as per demand with the rapid speed of access.
®
 Minimizes licensing Cost of the Softwares : The remote delivery of software

applications saves licensing cost such that users do not need to buy or renew
expensive software licenses or programs.
 Less personnel training : The users of cloud do not need any personal training to
deploy or access the cloud services as the self-service portals of cloud used to have
user friendly GUIs where anyone can work with cloud services easily.
 Flexibility of work practices : Cloud computing provides freedom of access to their
users such that the employees can work more flexibly in their work practices. The
flexibility of access to the cloud data allow employees to work from home or on
holiday through internet.
 Sharing of resources and costs : The cloud computing fulfills the requirement of
users to access resources through a shared pool which can be scaled easily and
rapidly to any size. The sharing of resources saves huge cost and makes efficient
utilization of the infrastructure.
 Minimize spending on technology infrastructure : As public cloud services are
readily available, the Pay as you go or subscription based utility feature of cloud
allows you to access the cloud services economically at cheaper rate. Therefore, it
reduces the spending on in-house infrastructure.
 Maintenance is easier : As cloud computing services are provided by service
provider through internet, the maintenance of services is easier and managed by
cloud service providers itself.
 Less Capital Expenditure : There is no need to spend big money on hardware,
software or licensing fees so capital expenditure is very less.
 On-demand self-service : The cloud provides automated provisioning of services
on demand through self-service websites called portals.
 Broad network access : The cloud services and resources are provided through a
location independent broad network using standardized methods.
 Resource pooling : The cloud service provider adds resources together into a
resource pool through which user can fulfill their requirements and pool can be
made easily available for multitenant environment.
 Measured services : The usage of cloud services can be easily measured using
different measuring tools to generate a utility-based bill. Some of the tools can be
used to generate a report of usage, audit and monitored services.
 Rapid elasticity : The cloud services can be easily, elastically and rapidly
provisioned and released through a self-service portal.
®
 Server Consolidation : The server consolidation in cloud computing uses an

effective approach to maximize the resource utilization with minimizing the energy
consumption in a cloud computing environment. The virtualization technology
provides the feature of server consolidation in cloud computing.
 Multi-tenancy : A multi-tenancy in cloud computing architecture allows customers
to share same computing resources in different environment. Each tenant's data is
isolated and remains invisible to other tenants. It provides individualized space to
the users for storing their projects and data.
1.2 Definition of Cloud Computing

There are many definitions of cloud computing proposed by different standardize
organizations arises. The most prominent and standardize definition of cloud computing
by National Institute of Standards and Technologies (NIST), a U.S. government entity
says that,
Definition :
The Cloud computing is a model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned and released
with minimal management effort or service providers interaction.
1.3 Evolution of Cloud Computing

The cloud computing becomes very popular in short span of time along with delivery
of prominent and unique benefits which were never before. Therefore, it is important to
understand evaluation of cloud computing. In this section we are going to understand the
evolution of cloud computing with respect to hardware, internet, protocol, computing
and processing technologies.
1.3.1 Evolution of Hardware

a) First-Generation Computers : The first-generation computing hardware called Mark
and Colossus, which was used for solving the binary arithmetic. It was developed
in 1930 which became foundation for programming languages, computer
processing and terminologies. The first generation of computer was evolved with
second version in 1943 at Harvard university which was an electromechanical
programable computer by mark and colossus. It was developed using vacuum
tubes and hardwire circuits where punch cards were used to stored data.
®
Cloud Computing 1 - 10 Introduction
b) Second-Generation Computers : The second-generation computing hardware

called ENIAC (Electronic Numerical Integrator and Computer) builded in 1946,
which was capable to solve range of computing problems. It was capable to perform
one lakh calculations per seconds. It was composed of thermionic valves, transistors
and circuits.
c) Third-Generation Computers : The Third-generation computers were produced in
1958 using integrated circuits (IC’s). The first mainframe computer by IBM was
developed in this era. The IBM 360 has got more processing and storage capabilities
due to the integrated circuit. The minicomputer was also developed in this era. At
the later stage, intel has released first commercial microprocessor called intel 4004
which has multiple transistors integrated on a single chip to perform processing at
faster speed.
d) Fourth-Generation Computers : The fourth-generation computer have introduced
microprocessors with single integrated circuits and random-access memory for
performing execution of millions of instructions per seconds. In this phase IBM
have developed an personal computers in 1981 along with LSI and VLSI
Microchips.
1.3.2 Evolution of Internet and Protocols

The evolution of internet has begun in 1930 with the concept of MEMEX for storing
books records and communication. In 1957 Soviet Union have launched the first satellite
to create Advanced Research Project Agency (ARPA) for US military. The internet was
firstly introduced with the creation of ARPANET in 1967 where they have used internet
message processors (IMP) at each site for communication. In ARPANET, initially host to
host protocols were used for communication which were evolved with application
protocols like FTP and SMTP in 1983. ARPANET has introduced a flexible and powerful
TCP-IP protocol suit which is used over the internet till today. The internet protocol had
initial version IPV4 which again evolved with new generation IPV6 protocol. The first
web browser MOSAIC was developed in 1990 by Berners-Lee followed by Netscape
browser in 1994.In 1995, Microsoft had developed a Windows 95 operating system with
integrated browser called Internet Explorer along with supporting dial-up TCP/IP
protocols. The first web server work on hypertext transfer protocol released in 1996
followed by various scriptings supported web servers and web browsers.
®
1.3.3 Evolution of Computing Technologies

Few decades ago, the popular computing technology for processing a complex and
large computational problem was “Cluster computing”. It has group of computers were
used to solve a larger computational problem as a single unit. It was designed such a way
that the computational load used to divide in to similar unit of work and allocated across
multiple processors which is balanced across the several machines. In 1990, the cluster
computing was evolved with the concept of Grid Computing which was developed by
Ian Foster. The grid computing is nothing but the group of interconnected independent
computers intended to solve a common computational problem as a single unit. They are
usually geographically distributed across different locations like city, country or
continents. It was analogous to electric Grids where users were allowed to plug in and
use the computing power as like utility service. But the main limitation of grid computing
was data residency, as data were located and stored at geographically diverse locations
miles away from each other’s. So further, grid computing is evolved with the cloud
computing where centralized entity like data centers is used to offer different computing
services to others which is similar to grid computing model. The cloud computing
becomes more popular with the introduction of “Virtualization” technology. The
Virtualization is a method of running multiple independent virtual operating systems on a
single physical computer. It saves hardware cost due to consolidation of multiple servers
along with maximum throughput and optimum resource utilization.
1.3.4 Evolution of Processing Technologies

When computers were initially launched, people used to work with mechanical
devices, vacuum tubes, transistors, etc. Then with the advent of Small-Scale Integration
(SSI), Medium Scale Integration (MSI), Large Scale Integration (LSI), and Very Large-Scale
Integration (VLSI) technology, circuits with very small dimension became more reliable
and faster. This development in hardware technology gave new dimension in designing
processors and its peripherals. The processing is nothing but the execution of programs,
applications or tasks on one or more computers. The two basic approaches of processing
are serial and parallel processing. In serial processing, the given problem or task is broken
into a discrete series of instructions. These instructions are executed on a single processor
sequentially. In Parallel processing, the tasks of programming instructions are executed
simultaneously across multiple processors with the objective of running program in a
®
lesser time. The next advancement in parallel processing was multiprogramming.

In Multiprogramming system, the multiple programs are submitted at same time for
execution where each program is allowed to use the processor for a specific period of
allotted time. Here, each program gets an equal amount of time to use the processor in
round robin manner in order to execute the instructions. Later, multiprogramming
system was evolved with vector processing. The vector processing was developed to
increase the processing performance by operating in a multitasking manner. It was
specially designed to perform Matrix operations to allow a single instruction to
manipulate two arrays of numbers performing arithmetic operations. The vector
processing was used in certain applications where the data generated in the form of
vectors or matrices. The next advancement to vector processing was the development of
symmetric multiprocessing systems (SMP). As multiprogramming and vector processing
system has limitation of managing the resources in master slave model, the symmetric
multiprocessing systems was designed to address that problem. The SMP systems is
intended to achieve sequential consistency where each processor is assigned an equal
number of OS tasks. These processors are responsible for managing the workflow of task
execution as it passes through the system. Lastly, Massive parallel processing (MPP) is
developed with many independent arithmetic units or microprocessors that runs in
parallel and are interconnected to act as a single very large computer. Today, the
massively parallel processor arrays can be implemented into a single-chip which becomes
cost effective due to the integrated circuit technology and it is mostly used in advanced
computing applications used in artificial intelligence.
1.4 Underlying Principles of Parallel and Distributed Computing

In previous section, we have seen the evolution of cloud computing with respect to its
hardware, internet, protocol and processing technologies. This section briefly explains
about the principals of two essential computing mechanisms which largely used in cloud
computing called Parallel and Distributed computing. Computing in computer
technology can be defined as the execution of single or multiple programs, applications,
tasks or activities, sequentially or parallelly on one or more computers. The two basic
approaches of computing are serial and parallel computing.
®
1.4.1 Serial Computing

In serial computing, the given problem or task is broken into a discrete series of
instructions. These instructions are executed on a single processor sequentially as shown
in Fig. 1.4.1. The problem in serial computing is that it executes only one instruction at
any moment of time. It is mostly used in monolithic applications on single machine which
do not have any time constraint.
Fig. 1.4.1 : Serial Computing
1.4.2 Parallel Computing

As single processor system is becoming archaic and quaint for doing fast computation
as required by real-time applications. So parallel computing is needed to speed up the
execution of real-time applications to achieve high performance. The parallel computing
makes use of multiple computing resources to solve a complex computational problem in
which the problem is broken into discrete parts that can be solved concurrently as shown
in Fig. 1.4.2.
Fig. 1.4.2 : Parallel Computing
®
Each part is further broken down into a series of instructions which execute
simultaneously on different processors using overall control/coordination mechanism.
Here, the different processors share the work-load which results in producing the much
higher computing power and performance than could not be achieved with traditional
single processor system.
The parallel computing often correlated with parallel processing and parallel
programming. Processing of multiple tasks and subtasks simultaneously on multiple
processors is called parallel processing while parallel programming refers to
programming on a multiprocessor system using the divide-and-conquer technique, where
given task is divided into subtasks and each subtask is processed on different processors.
1.4.2.1 Hardware Architectures for Parallel Processing
In parallel processing, CPU is the core component responsible for executing the tasks
and subtasks in the programs. Each program consists of different streams, called
instructions streams and data streams. These instruction streams and data streams are
observed by the CPU during program execution. Therefor the hardware architecture for
parallel computers is characterized by Flynn’s classification method. Flynn characterized
the parallel computers in terms of the number of instruction streams over the data
streams. A flow of operands (data) between the processor and memory is called data
stream while flow of instructions is called as instruction stream. Flynn’s classification
depends upon the number of streams following at any point of execution. The basic
classification stated by Flynn is shown in Figure 1.4.3.
Fig. 1.4.3 : Flynn’s Classification for parallel computers
®
a) Single Instruction, Single Data (SISD)

It is a serial (non-parallel) computer which executes single instruction over single data
stream. Single instruction means that only one instruction stream is being acted on by the
CPU, while single data means that only one data stream is being used as input during one
clock cycle, as shown in Fig. 1.4.4.
Fig. 1.4.4 : SISD architecture
This type of computers performs sequential computing and give low performance.
Examples of SISD are older generation computers and computers with single non-
pipelined processor.
b) Single Instruction, Multiple Data (SIMD)

It is a type of parallel computer in which all processing elements execute the same
instruction at any given clock cycle and each processing element operates on a different
data element. All the processing elements receive the same instruction broadcasted from
the control unit.
Fig. 1.4.5 : SIMD architecture.
®
Each processing element takes the data from its own memory. It is best suited for
specialized problems characterized by a high degree of regularity, such as
graphics/image processing. It has synchronous (lockstep) and deterministic execution. It
uses parallel architectures like array processors and vector pipelines. Most of the modern
computers employ SIMD architecture as shown in Fig. 1.4.5. Examples of SIMD
organization are ILLIAC-IV, PEPE, BSP, STARAN, MPP, DAP and the Connection
Machine (CM-1).
c) Multiple Instruction, Single Data (MISD)

It is a type of parallel computer with multiple instructions over single data stream.
Each processing unit operates on the data independently via separate instruction stream.
Single data stream is fed into multiple processing units, as shown in Fig. 1.4.6. Examples
of MISD are multiple frequency filters operating on a single signal stream and multiple
cryptography algorithms attempting to crack a single coded message.
Fig. 1.4.6 : MISD architecture.
d) Multiple Instruction, Multiple Data (MIMD)

This is the most powerful parallel computer in which every processor will be
executing a different instruction stream on different data stream, as shown in Figure 1.4.7.
®
Fig. 1.4.7 : MIMD architecture.
As MIMD computers are able to run independent programs, many tasks can be
performed at the same time. The execution in MIMD can be synchronous or
asynchronous, deterministic or non-deterministic. In the real sense MIMD architecture is
said to be a parallel computer. Examples of MIMD are most current multicore computers,
multiprocessor computers, networked computer clusters and supercomputers.
1.4.2.2 Shared Memory Architecture for Parallel Computers
An important characteristic of shared memory architecture is that there are more than
one processor and all processors share same memory with global address space. In this,
the processors operate independently and share same memory resources. Changes in a
memory location done by one processor are visible to all other processors. Based upon
memory access time, the shared memory is further classified into uniform memory access
(UMA) architecture and non-uniform memory access (NUMA) architecture which are
discussed as follows :
1. Uniform memory access (UMA) : An UMA architecture comprises two or more
processors with identical characteristics. The UMA architectures are also called as
symmetric multiprocessors. The processors share the same memory and are
interconnected by bus-shared interconnection scheme such that the memory access
time is almost same. The IBM S/390 is an example of UMA architecture which is
shown in Fig. 1.4.8 (a).
®
2. Non-uniform memory access (NUMA) : This architecture uses one or more

symmetric multiprocessors that are physically linked. A portion of memory is
allocated with each processor. Therefore, access to the local memory becomes faster
than the remote memory. In this mechanism, all processors do not get equal access
time to the memory which is connected by the interconnection network; therefore,
the memory access across the link is always slow, The NUMA architecture is shown
in Fig. 1.4.8 (b).
Fig. 1.4.8
1.4.2.3 Distributed Memory Architecture for Parallel Computers
In the distributed memory system, the concept of global memory is not used as each
processor uses its own internal (local) memory for computing.
Fig. 1.4.9 : Distributed memory architecture.
Therefore, changes made by one processor in its local memory have no effect on the
memory of other processors, and memory addresses in one processor cannot be mapped
with other processors. Distributed memory systems require a communication network to
connect inter-processor memory, as shown in Fig. 1.4.9. The distributed memory
architecture is also called as message passing architecture. The speed and performance of
this type of architecture depends upon the way the processors are connected.
®
1.4.3 Distributed Computing

As per Tanenbaum, the definition of distributed system is referring to a collection of
independent computers that appears to its users as a single coherent system.
Fig. 1.4.10 : Conceptual view of distributed system
The term distributed computing encompasses any architecture or system that allows
the computation to be broken down into units and executed concurrently on different
computing elements. It is a computing concept that refers to multiple computer systems
connected in a network working on a single problem. In distributed computing, a single
problem is divided into many parts, and each part is executed by different computers. As
long as the computers are networked, they can communicate with each other to solve the
problem. If it is done properly, the computers perform like a single entity. The ultimate
goal of distributed computing is to maximize performance by connecting users and IT
resources in a cost-effective, transparent and reliable manner. This type of computing is
highly scalable. The Conceptual view of distributed system is shown in Fig. 1.4.10.
1.4.3.1 Architectural Models for Distributed System
The architectural model of distributed system is related to the placement of different

machines in a network intended to solve the independent tasks. It defines the method in
which, how the diverse components of the system can interact with each other and how
®
they are mapped onto the underlying network of computers. There are mainly two
architectural models for distributed system namely client-server model and the peer-to-
peer (P2P) model. The architecture of client-server model is shown in Fig. 1.4.11 (a) and
peer-to-peer is shown in Fig. 1.4.11 (b).
Fig. 1.4.11
The client-server model is widely used architecture in most of the distributed

technologies, where client process interacts with the server process to get access to the
shared resources. The client-server model is usually based on a simple request/reply
protocol implemented with send/receive primitives using communication middlewares
like remote procedure calls (RPC) and remote method invocation (RMI). In the client-
server model, the client process requests for the procedure running on the server machine
using underlying network. Once server procedure receives the request, it is executed and
result is generated. The processed result is sent back to the client as a response by the
server.
In the Peer-to-Peer (P2P) model, there is no distinction between the client and the
server process. All computers in the network get same privileges and run same program
with the same set of interfaces. The pattern of communication always depends on the
type of application. The two disadvantages of P2P are the lack of scalability and high
complexity.
®
In distributed system, the multiple concurrent processes used to interact continuously

with each other over the network. The Interprocess communication (IPC) is the
fundamental method of communication in distributed systems, its design and
implementation. The IPC is used to synchronize the activities of processes locally and
exchange the data between the processes. There are several models used in distributed
system for making the remote communication between processes. Some of the most
relevant models used for communication in distributed system are Remote Procedure
Call (RPC), Message Oriented Middleware (MOM) and Remote Method Invocation
(RMI).
The Remote Procedure Call (RPC) is one of the most effective models used in most of
the modern distributed applications for remote communication. The remote procedure
call is used to call a procedure residing on the remote machine using local call to get the
result. Although the procedures lie on remote machine, yet they appear like a local one
that hides the communication mechanism.
The Message Oriented Middleware (MOM) is another model used to send and receive
communication messages between multiple clients and servers. It uses data structure like
queue to store and retrieve messages. Queuing mechanism between the client and servers
avoids the message been misplaced, which occur when the client is sending messages
faster than the receiver receiving it or the client is sending the message when the receiver
is not available. It is purely an asynchronous mechanism where communication messages
can be sent if the receiver is not available. The popular example of MOM based
communication is an email system where sender can send the mails to the recipient who
is not available at that moment.
The Remote Method Invocation (RMI) is another model based on distributed object
technologies where the objects are distributed and located by using RMI registry. The
client can access remote objects by using the interfaces. The main disadvantage of RMI is
that it does not have support of the heterogeneity and is compatible with Java platform
only. Thus, there was a need of distributed object technology which should support
heterogeneous hardware platforms, operating systems and network protocols.
Apart from above models, two prominent architectures in distributed system used in
cloud computing are Service oriented architecture and Web services. The Service oriented
architecture (SOA) is an architectural style for building an enterprise solution based on
services. It maintains a software system into a collection of interacting services.
Applications built using an SOA style deliver functionality as services that can be used or
reused when building applications or integrating within the enterprise or trading
partners. A SOA application is a composition of services that encapsulate a business
process.
®
The Web services are loosely coupled (platform independent), contractual components
that communicate in XML-based (open standard) interfaces. The Web service composed
of set of operations that can be invoked by leveraging Internet-based protocols. It
provides method for supporting operations with parameters and return their values with
complex and simple types. The semantics for Web services are expressed through
interoperable XML based protocol called SOAP (Simple Object Access Protocol). SOAP is
the communication protocol used in web services with request-reply primitives. The
services in web services are defined in a standardize XML document called WSDL (Web
Service Description Language) which expresses simple and complex types in a platform-
independent manner. In web services, the UDDI (Universal Descriptor Discovery and
Integration) registry is used for registering the objects called by consumers and published
by the providers.
The Web services and Web 2.0 are the fundamental building blocks for cloud
computing. The front end of recent cloud platform is mostly built using Web 2.0 and
related technologies while services of cloud are delivered through Web services or SOA
based technologies.
1.5 Cloud Characteristics

The NIST have defined five essential characteristics of cloud computing, which are
explained as follows.
 On-demand self-service : Each consumer can separately provisioned computing
capabilities like server time, compute resources, network and storage, as needed
automatically without requiring human interaction with service provider.
 Broad network access : The cloud capabilities are available over the network which
are provisioned through a standardize network mechanisms that are used by
heterogeneous client’s platforms like thin or thick client, mobile phones, tablets,
laptops, and workstations.
 Resource pooling : The cloud service provider’s computing resources are pooled
together to serve multiple consumers using a multi-tenant model, with different
physical and virtual resources. These resources are dynamically assigned and
reassigned as per consumer demand. The examples of resources include storage,
processing, memory, and network bandwidth. These resources are provided in a
location independence manner where customer generally has no control or
knowledge over the exact location of the provided resources.
®
 Rapid elasticity : In cloud, the different resource capabilities can be elastically

provisioned and released automatically as per demand. To scale rapidly outward
and inward the elasticity required. To the consumers, the capabilities are available
for provisioning appears to be unlimited and can be seized in any measure at any
time.
 Measured service : Cloud systems automatically control and optimize the resource
use by consumers. They are controlled by leveraging the metering capability at
some level of abstraction appropriate to the type of service (e.g., storage, processing,
bandwidth, and active user accounts). The cloud system provides a mechanism for
measuring the usage of resources for monitoring, controlling, and billing purposes.
They are reported for providing transparency for both the providers and consumers
of the utilized service.
Apart from that there are some other characteristics of cloud computing are given as
follows :
a) Cloud computing mostly uses Open Source REST based APIs (Application
Programmer Interface) builded on web services that are universally available and
allow users to access the cloud services through web browser easily and efficiently.
b) Most of the cloud services are location independent which are provisioned at any
time, from anywhere and on any devices through internet.
c) It provides agility to improve the reuse of Cloud resources.
d) It provides end-user computing where users have their own control on the
resources used by them opposed to the control of a centralized IT service.
e) It provides Multi-tenancy environment for sharing a large pool of resources to the
users with additive features like reliability, scalability, elasticity, security etc.
1.6 Elasticity in Cloud

The cloud computing comprises one of the important characteristics called
“Elasticity”. The elasticity is very important for mission critical or business critical
applications where any compromise in the performance may leads to huge business loss.
So, elasticity comes into picture where additional resources are provisioned for such
application to meet the performance requirements and demands. It works such a way that
when number of user access increases, applications are automatically provisioned the
extra computing, storage and network resources like CPU, Memory, Storage or
bandwidth and when a smaller number of users are there it will automatically decrease
those as per requirement. The Elasticity in cloud is a popular feature associated with
®
scale-out solutions (horizontal scaling), which allows for resources to be dynamically

added or removed when needed. It is generally associated with public cloud resources
which is commonly featured in pay-per-use or pay-as-you-go services.
The Elasticity is the ability to grow or shrink infrastructure resources (like compute,
storage or network) dynamically as needed to adapt to workload changes in the
applications in an autonomic manner. It makes make maximum resource utilization
which result in savings in infrastructure costs overall. Depends on the environment,
elasticity is applied on resources in the infrastructure that is not limited to hardware,
software, connectivity, QoS and other policies. The elasticity is completely depending on
the environment as sometimes it may become negative trait where performance of certain
applications must have guaranteed performance.
The elasticity is mostly used in IT organizations where during the peak hours when all
employees are working on cloud (i.e. between 9 AM to 9 PM), the resources are scaled
into the highest mark while during non-peak hours when limited employees are working
(i.e. between 9 PM to 9 AM), the resources are scaled-out to lowest mark where a discrete
bill is generated for low usage and high usage which saves the huge cost. Another
example of elasticity is Indian railways train booking service called IRCTC. Earlier during
the tatkal booking period, the website used to be crashed due to the incapability of
servers to handles too many users’ requests for booking a ticket at specific time. But
nowadays it won’t happen because of elasticity provided by cloud for the servers such a
way that during the tatkal booking period the infrastructure resources are automatically
scaled in as per users request so that website never stops in between and scaled out when
a smaller number of users are there. This may lead to provide a huge flexibility and
reliability for the customers who are using the service.
1.7 On-demand Provisioning

The on-demand provisioning is another important benefit provided by cloud
computing. The public cloud services are available globally through internet. The Cloud
Service Providers (CSPs) are responsible for providing their cloud services through a
single self-service portal where customers can pick the specific service whichever they
want to use in their enterprise. As delivery of cloud services are provided through
internet, they are available on-demand everywhere through a self-service portal.
The on-demand provisioning in cloud computing refers to process for the automated
deployment, integration and consumption of cloud resources or services by an
individuals or enterprise IT organizations. It incorporates the policies, procedures and an
enterprise’s objective in sourcing the cloud services and solutions from a cloud service
®
provider. The on-demand provisioning is used to provision the cloud services

dynamically on demand along with resources like hardware, software, data sets or
servers running several choices of operating systems with customized software stacks
over the internet.
The on-demand provisioning is applicable for every cloud service or resources like
hardware, software, data sets or servers running several choices of operating systems
with customized software stacks dynamically over the internet.
It is essentially developed to meet the common challenge in computing like fluctuating
demand for resources faced by enterprises. As demand for compute resources enterprises
vary drastically time to time. Therefore, on-demand provisioning enables enterprises to
access additional resources from anywhere, at any time and on any supported device
dynamically.
1.8 Challenges in Cloud Computing

Some of the challenges in cloud computing are explained as follows :
1.8.1 Data Protection

The data protection is the crucial element of security that warrants scrutiny. In cloud,
as data is stored on remote data center and managed by third party vendors. So, there is a
fear of losing confidential data. Therefore, various cryptographic techniques have to be
implemented to protect the confidential data.
1.8.2 Data Recovery and Availability

In cloud, the user’s data is scattered across the multiple datacenters therefore the
recovery of such data is very difficult as user never comes to know what is the exact
location of their data and don’t know how to recover that data. The availability of the
cloud services are highly associated with downtime of the services which is mentioned in
the agreement called Service Level Agreement (SLA). Therefore, any compromise in SLA
may leads increase in downtime with less availability and may harm your business
productivity.
1.8.3 Regulatory and Compliance Restrictions

Many of the countries have Compliance Restrictions and regulation on usage of Cloud
services. Therefore, the Government regulations in such countries do not allow providers
to share customer's personal information and other sensitive information to outside states
or country. In order to meet such requirements, cloud providers need to setup a data
center or a storage site exclusively within that country to comply with regulations.
®
1.8.4 Management Capabilities

The involvement of multiple cloud providers for in house services may leads to
difficulty in management.
1.8.5 Interoperability and Compatibility Issue

The services hosted by the organizations should have freedom to migrate the services
in or out of the cloud which is very difficult in public clouds. The compatibility issue
comes when organization wants to change the service provider. Most of the public cloud
provides vendor dependent APIs for access and they may have their own proprietary
solutions which may not be compatible with other providers.
Summary
 The cloud computing becomes a must have technology in every IT organization

because of its prominent features over existing computing technologies. It is
often compared with other computing architectures like peer-to-peer, client-
server, grid computing, distributed computing and cluster computing.
 A peer-to-peer architecture has collection of hosts connected in a network
intended for resource sharing, task processing, and communication, In Client-
server architecture, there is at least one specialized server which controls the
communication between multiple clients.
 The Grid Computing architecture has geographically distributed computing
resources which work together to perform a common task.
 In distributed computing, a single problem is divided into many parts, and each
part is executed by different computers while a cluster is a group of loosely
coupled homogenous computers that work together closely, so that in some
respects they can be regarded as a single computer.
 According to NIST, Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage, applications, and services)
that can be rapidly provisioned and released with minimal management effort
or service provider interaction.
 Computing in computer technology can be defined as the execution of single or
multiple programs, applications, tasks or activities, sequentially or parallelly on
one or more computers.
 The hardware architectures for parallel processing has four types namely Single
Instruction, Single Data (SISD), Single Instruction, Multiple Data (SIMD),
Multiple Instruction, Single Data (MISD) and Multiple Instruction, Multiple Data
(MIMD).
®
 As per Tanenbaum, the definition of distributed system is referring to a

collection of independent computers that appears to its users as a single coherent
system.
 There are several models used in distributed system for communication but
most relevant models are Remote procedure call (RPC), Message Oriented
Middleware (MOM) and Remote Method Invocation (RMI).
 The Web services and Web 2.0 are the fundamental building blocks for cloud
computing. The front end of recent cloud platform is mostly built using Web 2.0
and related technologies while services of cloud are delivered through Web
services or SOA based technologies.
 The Elasticity is the ability to grow or shrink infrastructure resources
dynamically as needed to adapt to workload changes in the applications in an
autonomic manner while on-demand provisioning is used to provision the cloud
services dynamically on demand along with resources like hardware, software,
data sets or servers running several choices of operating systems with
customized software stacks over the internet.
Short Answered Questions
Q.1 Define cloud computing.

Ans. : According to NIST, Cloud computing is a model for enabling ubiquitous,
convenient, on-demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management effort or service provider
interaction.
Q.2 Enlist the pros and cons of cloud computing. SPPU : Dec.-19
Ans. : The pros and cons of cloud computing are
Pros of Cloud computing

 Improved accessibility
 Optimum Resource Utilization
 Scalability and Speed
 Minimizes licensing Cost of the Softwares
 On-demand self-service
 Broad network access
 Resource pooling
 Rapid elasticity
®
Cons of Cloud computing

 Security
 Privacy and Trust
 Vendor lock-in
 Service Quality
 Cloud migration issues
 Data Protection
 Data Recovery and Availability
 Regulatory and Compliance Restrictions
 Management Capabilities
 Interoperability and Compatibility Issue
Q.3 What are different characteristics of cloud computing ?

Ans. : The characteristics of cloud computing are
 On-demand self-service : A consumer can unilaterally provision computing

capabilities, such as server time and network storage, as needed automatically
without requiring human interaction with each service provider.
 Broad network access : Capabilities are available over the network and accessed
through standard mechanisms that promote use by heterogeneous thin or thick client
platforms (e.g., mobile phones, tablets, laptops, and workstations).
 Resource pooling : The provider’s computing resources are pooled to serve multiple
consumers using a multi-tenant model, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand. There is a
sense of location independence in that the customer generally has no control or
knowledge over the exact location of the provided resources but may be able to
specify location at a higher level of abstraction (e.g., country, state, or datacenter).
Examples of resources include storage, processing, memory, and network
bandwidth.
 Rapid elasticity : Capabilities can be elastically provisioned and released, in some
cases automatically, to scale rapidly outward and inward commensurate with
demand. To the consumer, the capabilities available for provisioning often appear to
be unlimited and can be appropriated in any quantity at any time.
 Measured service : Cloud systems automatically control and optimize resource use
by leveraging a metering capability1 at some level of abstraction appropriate to the
type of service (e.g., storage, processing, bandwidth, and active user accounts).
®
Resource usage can be monitored, controlled, and reported, providing transparency

for both the provider and consumer of the utilized service.
Q.4 Explain the term “Elasticity in cloud computing”.
Ans. : The elasticity is very important for mission critical or business critical
applications where any compromise in the performance may leads to huge business
loss. So, elasticity comes into picture where additional resources are provisioned for
such application to meet the performance requirements. It works such a way that when
number of user access increases, applications are automatically provisioned the extra
computing, storage and network resources like CPU, Memory, Storage or bandwidth
and when a smaller number of users are there it will automatically decrease those as
per requirement. The Elasticity in cloud is a popular feature associated with scale-out
solutions (horizontal scaling), which allows for resources to be dynamically added or
removed when needed. It is generally associated with public cloud resources which is
commonly featured in pay-per-use or pay-as-you-go services.
The Elasticity is the ability to grow or shrink infrastructure resources (like compute,
storage or network) dynamically as needed to adapt to workload changes in the
applications in an autonomic manner. It makes make maximum resource utilization
which result in savings in infrastructure costs overall. Depends on the environment,
elasticity is applied on resources in the infrastructure that is not limited to hardware,
software, connectivity, QoS and other policies. The elasticity is completely depending on
the environment as sometimes it may become negative trait where performance of
certain applications must have guaranteed performance.
Q.5 What is on-demand provisioning ?
Ans. : The on-demand provisioning is the important benefit provided by cloud
computing. The on-demand provisioning in cloud computing refers to process for the
deployment, integration and consumption of cloud resources or services by an
individuals or enterprise IT organizations. It incorporates the policies, procedures and
an enterprise’s objective in sourcing the cloud services and solutions from a cloud
service provider. The on-demand provisioning is used to provision the cloud services
dynamically on demand along with resources like hardware, software, data sets or
servers running several choices of operating systems with customized software stacks
over the internet. The public cloud services are available globally through internet. The
cloud service providers (CSPs) are responsible for providing their cloud services
through a single self-service portal where customers can pick the specific service
whichever they want to use in their enterprise. As delivery of cloud services are
®
provided through internet, they are available on-demand everywhere through a self-
service portal.
Q.6 Differentiate between Grid and Cloud Computing. SPPU : Dec.-17
Ans. :
Feature Grid Computing Cloud Computing
Computing architecture Distributed computing Client-server computing
Scalability Low to moderate High.
Flexibility Less More
Management Decentralized Centralized
Owned and Managed by Organizations Cloud service providers
Provisioning Application-oriented. Service-oriented.
Accessibility Through grid middleware Through standard web protocols
Resource allocation pre-reserved on-demand
Speed Slow Fast
Resource management Distributed Centralized
Cost High Low
Q.7 Highlight the importance of Cloud Computing. SPPU : Dec.-16

Ans. : Cloud computing is important in every business and applications due to the
following advantages
 Scalability and Speed : Enterprises do not have to invest money and time behind
buying and setting up the hardware, software and other resources. They can quickly
scale up or scale down their resources and services running on the Cloud as per
demand with rapid speed of access.
 Minimizes licensing Cost of the Softwares : The remote delivery of Software
applications saves licensing cost such that users do not need to buy or renew
expensive software licenses or programs.
 Less personnel training : The users of cloud do not need any personal training to
access or deploy the cloud services as the appearance of cloud portals are used to be
more user friendly.
 Flexibility of work practices : Cloud computing provides freedom of access to their
users such that the employees can work more flexibly in their work practices. The
®
flexibility of access to the cloud data allow employees to work from home or on
holiday.
 Sharing of resources and costs : The cloud computing fulfills the requirement of
users to access resources through a shared pool which can be scaled easily and
rapidly to any size. The sharing of resources saves huge cost and makes efficient
utilization of the infrastructure.
 Minimize spending on technology infrastructure : As public cloud services are
readily available and the Pay as you go feature of cloud allows you to access the
cloud services economically. Therefore, it reduces the spending on in-house
infrastructure.
 Maintenance is easier : As cloud computing services are provided by service
provider through internet, the maintenance of services is easier and managed by
cloud service providers itself.
 Less Capital Expenditure : There is no need to spend big money on hardware,
software or licensing fees so capital expenditure is very less.
 On-demand self-service : The cloud provides automated provisioning of services on
demand through self-service websites called portals.
 Server Consolidation : The increased resource utilization and reduction in power
and cooling requirements achieved by server consolidation are now being expanded
into the cloud. Server consolidation is an effective approach to maximize resource
utilization while minimizing energy consumption in a cloud computing
environment.
 Energy Resource Management : Significant saving in the energy of a cloud data
center without sacrificing SLA are an excellent economic incentive for data center
operators and would also make a significant contribution to greater environmental
sustainability.
Q.8 Enlist any two advantages of distributed systems. SPPU : Dec.-18
Ans. : Advantages of Distributed System are
 Supports heterogeneous hardware and software.
 The resources shared in the distributed system are easily accessible to the users
across the network.
 The distributed system is scalable such a way that if the number of users or
computers increases, the performance of the system does not get affected.
 It is capable to detect and recover from failure, that is, it should be fault tolerant and
robust.
®
 It provides different levels of transparency like

(a) Access transparency deals with providing efficient access to the system by hiding
the implementation details.
(b) Location transparency deals with providing access to the resources irrespective
of their location, that is, it hides the location of the servers.
(c) Migration transparency makes sure that even if the servers are migrated from
one location to the other, they will not affect the performance of system.
(d) Replication transparency deals with hiding the replication of data related to the
backup.
(e) Failure transparency makes sure that any failure to the system will not affect
availability of the system.
(f ) Relocation transparency hides the resources when they are relocated to the other
location.
(g) Concurrent transparency deals with providing concurrent access to the shared
resources efficiently.
(h) Performance transparency deals with providing improvement in the system to
achieve high performance.
(i) Network transparency deals with providing transparent network
communication.
Q.9 Define SOA. SPPU : Dec.-18
Ans. : The Service Oriented Architecture (SOA) is an architectural style for building an
enterprise Solution based on Services. It maintains a software system into a collection of
interacting services. Applications built using an SOA style deliver functionality as
services that can be used or reused when building applications or integrating within the
enterprise or trading partners. An SOA application is a composition of services that
encapsulate a business process.
Q.10 What is web service ? SPPU : Dec.- 18
Ans. : The Web services are loosely coupled (platform independent), contractual
components that communicate in XML-based (open standard) interfaces. The Web
service composed of set of operations that can be invoked by leveraging Internet-based
protocols. It provides method operations supporting parameters and return values with
complex and simple types.
®
Long Answered Questions
Q.1 Illustrate the evolution of distributed computing to grid and cloud computing.
SPPU : Dec.-19
Ans. : Evolution of distributed computing to grid and cloud computing
At the initial stage of computing standalone computers were used to solve large
complex tasks in a sequential manner called serial computing. In serial computing,
large problems were divided in to number of smaller tasks which were solved serially
or sequentially on standalone computers. The limitations of serial computing were
slower computing performance, low transmission speed and Hardware limitations.
Therefore, serial computing approach was evolved with centralized computing where
centralized server is used for computation.
The centralized computing is a type of computing architecture where all or most of
the processing/computing is performed on a central server. In this type, all the
computing resources are interconnected in a centralized manner to a single physical
system called server. The resources like processors, Memories and storages are shared
under single integrated operating system. But this technique limits when a centralized
server bottlenecks and has a single point of failure. Therefore, a parallel and
distributed computing approach come into the picture where multiple networked
computers are used to solve large scale problem.
In parallel computing, a complex and large-scale problem is broken into discrete
parts that can be solved concurrently. Each part is further broken down into a series of
instructions which are executed simultaneously on different processors. The execution
time is highly reduced in parallel computing as compared to serial computing because
of parallel execution. As multiple processors and memories are involved in the parallel
execution, management of memory addresses and processors address space is quite
difficult.
The distributed computing is evolved with the evolution of network where
multiple computers interconnected by the network are used to solve a complex
problem. It is opposite to centralized computing. It has a collection of independent
computers interconnected by the network which is used for executing high
computational job and appears to their users as a single coherent system. Like parallel
computing, a large problem is split into multiple tasks and each task is given to each
computer in a group for execution. Each computer in a group is equipped with an
independent processor, a local memory, and interfaces. The Communication between
any pair of the node is handled by message passing as no common memory is
available. The main advantage distributed computing is location independency as
multiple computers in a group with different geographic locations are used to solve a
large problem. The distributed computing then evolved with Grid computing.
®
The Grid Computing architecture has geographically distributed computing

resources which work together to perform a common task. A typical grid has pool of
loosely coupled computers who worked together to solve a complex computational
problem. The grid computing has heterogeneous resources which are controlled by a
common node called control node like client server architecture. Conceptually grid
computing works similar to cloud computing for providing services through a shared
pool of resources. The grid computing follows a distributed architecture while cloud
computing follows centralized distributed architecture. In Grid computing the
compute resources are distributed across cities, countries, and continents. Therefore,
they are managed completely in a distributed manner. The Grid computing is evolved
with Cloud computing which is a must have technology nowadays. The cloud
computing makes the use of distributed and parallel computing together. It is also
known as on-demand computing. Cloud computing is the dynamic delivery of IT
resources like hardware and software and capabilities as a service over the network.
The cloud can be built by integrating physical or virtualized resources together in
large datacenters.
Q.2 Outline the similarities and differences between distributed computing, Grid
computing and Cloud computing. SPPU : Dec.-18
Ans. : The similarities and differences between distributed computing, Grid computing
and Cloud computing are explained as follows.
a) Distributed Computing :
It is a computing concept that refers to multiple computer systems working on a
single problem. In distributed computing, a single problem is divided into many parts,
and each part is executed by different computers. As long as the computers are
networked, they can communicate with each other to solve the problem. If it is done
properly, the computers perform like a single entity. The ultimate goal of distributed
computing is to maximize performance by connecting users and IT resources in a cost-
effective, transparent and reliable manner. This type of computing is highly scalable.
b) Grid computing
The Grid Computing architecture has geographically distributed computing
resources which work together to perform a common task. A typical grid has pool of
loosely coupled computers who worked together to solve a complex computational
problem. The grid computing has heterogeneous resources which are controlled by a
common node called control node like client server architecture.
Conceptually grid computing works similar to cloud computing for providing
services through a shared pool of resources. The grid computing follows a distributed
®
architecture while cloud computing follows centralized distributed architecture. In

Grid computing the compute resources are distributed across cities, countries, and
continents. Therefore, they are managed completely in a distributed manner while in
cloud computing although resources are distributed but they are managed centrally.
The Cloud computing is advantageous over Grid computing in terms of availability,
scalability, flexibility, disaster recovery and load balancing.
c) Cloud computing
The Cloud computing is highly correlated with other computing models like grid
computing, distributed computing and parallel computing which are coupled with
virtualization. The goal of cloud computing is to make a better use of distributed
hardware and software resources which are combined together to achieve higher
throughput at lower cost and can able to solve large scale computation problems in
lesser time. Basically, Cloud computing is an aggregation of computing resources (like
CPU, memories), networking solutions, storage management and virtualization
solutions which are available on demand, and delivered economically.
 Difference between Distributed computing, Grid computing and Cloud computing
Feature Distributed Grid Computing Cloud Computing

Computing
Computing Client server and peer Distributed Client-server

architecture to peer computing computing
Scalability Low to moderate Low to moderate High.
Flexibility Moderate Less More
Management Decentralized Decentralized Centralized
Owned and Managed Organizations Organizations Cloud service

by providers
Provisioning Application and service Application oriented Service oriented.

oriented
Accessibility Through Through grid Through standard web

communication middleware protocols
protocols like RPC,
MoM, IPC, RMI
Resource allocation pre-reserved pre-reserved on-demand
®
Speed Moderate Slow Fast
Resource Centralize and Distributed Centralized

management distributed
Cost Moderate to high High Low
Q.3 Explain the evolution of Cloud Computing.

Ans. : Refer section 1.3.
Q.4 Describe the hardware architectures for parallel processing.

Ans. : Refer section 1.4.2.1.

®
2 Cloud Enabling Technologies
Syllabus
Service Oriented Architecture - REST and Systems of Systems - Web Services - Publish -
Subscribe Model - Basics of Virtualization - Types of Virtualization - Implementation Levels of
Virtualization - Virtualization Structures - Tools and Mechanisms - Virtualization of CPU -
Memory - I/O Devices - Virtualization Support and Disaster Recovery.
Contents
2.1 Service Oriented Architecture
2.2 REST and Systems of Systems
2.3 Web Services
2.4 Publish-Subscribe Model
2.5 Basics of Virtualization
2.6 Types of Virtualization
2.7 Implementation Levels of Virtualization
2.8 Virtualization Structures
2.9 Virtualization Tools and Mechanisms
2.10 Virtualization of CPU
2.11 Virtualization of Memory
2.12 Virtualization of I/O Device
2.13 Virtualization Support and Disaster Recovery
(2 - 1)
Cloud Computing 2-2 Cloud Enabling Technologies
2.1 Service Oriented Architecture

The Service Oriented Architecture (SOA) expresses a perspective of software
architecture that defines the use of loosely coupled software services to support the
requirements of the business processes. It is used for designing a software system that can
make the use of services of new or legacy applications through their published or
discoverable interfaces. It is nothing but collection of services that communicates with
each other using services interfaces. In an SOA environment, resources on a network are
made available as an independent service that can be accessed without knowledge of
their underlying platform implementation. The applications built using SOA are often
distributed over the networks which aim to make services interoperable, extensible and
effective. The architecture styles of SOA provide service loose coupling, published
interfaces, and a standard communication model. The SOA is also useful in building of
Grid and Cloud applications. The architecture style of SOA is defined by the World Wide
Web Consortium (W3C) based on the three parameters namely logical perspective,
message perspectives and description orientation.
The logical perspective or view of SOA explains how the applications, business
processes, services or databases perform a business-level operation and how the messages
are exchanged between provider agents and consumer agents.
The message perspective explains the insight of messages including internal structure
of providers and consumer’s message, their implementation languages, process construct,
database structure and so on. These features are needed for representing the abstracted
view of SOA.
The description orientation explains about the machine executable metadata. The
services in SOA are described by its metadata. The descriptor in metadata defines the
public nature of the SOA. It allows exposing the specific details of services to the public
and others are kept hidden. It allows documenting the semantics of a service directly or
indirectly, as per descriptor rule. It also maintains a granularity of services that are
intended to utilize the small number of operations with relatively large and complex
messages. The messages in SOA are always be platform neutral which are generated in a
standardized format and delivered through the XML based interfaces.
The SOA architecture is dissimilar to the component-based models. As component-
based models use tightly coupled components for designing and developing the
applications based on different technologies such as CORBA (Common Object Request
Broker Architecture) or DCOM (Distributed Component Object Model). SOA centers
around loosely coupled architecture for building software applications that uses common
®
protocols and technologies like HTTP and XML. It is identified with early efforts on the
architectural style of distributed systems, especially Representational State Transfer
(REST). These days, REST still gives an option in contrast to the complex standard-driven
web services technology and is utilized in many Web 2.0 services.
2.1.1 Architecture of SOA

 The SOA provides methods for design, deployment, and management of services
that are accessible over the network and executable. In SOA, a service provides a
discrete business function that operates on data to ensure that business functionality
is applied consistently, predictable results are returned, and quality of service is
delivered. The generalized architecture of SOA has three components namely
service providers, service consumers and service registry.
 The service provider is responsible for publishing the services in to a registry and
provides access to those using API and interfaces for the consumers. The provider
defines Quality of services and security parameters through contract called service
level agreement.
 The service consumer is responsible for invoking and accessing the services
published by provider through standard interfaces and APIs. Whenever service
consumer invokes a service, initially it has to find it inside service registry using
interfaces. If it is found in registry, then the discovery details are provisioned to the
consumer through which consumer can access the service from service provider.
 The service registry stores the references of services published by provider and
allows consumers to locate and access those using references.
Fig. 2.1.1 SOA architecture
®
 The Middleware like Enterprise Service Bus (ESB) provides an infrastructure for
integrating legacy applications and provide services for message translation,
message transformation, protocol conversion, message routing with QoS and
security services. The typical SOA architecture is shown in Fig. 2.1.1.
2.1.2 Characteristics of SOA

The different characteristics of SOA are as follows :
o Provides interoperability between the services.
o Provides methods for service encapsulation, service discovery, service composition,
service reusability and service integration.
o Facilitates QoS (Quality of Services) through service contract based on Service Level
Agreement (SLA).
o Provides loosely couples services.
o Provides location transparency with better scalability and availability.
o Ease of maintenance with reduced cost of application development and
deployment.
The next sections cover the introduction to REST along with web services and publish-
subscribe model in detail.
2.2 REST and Systems of Systems

Representational State Transfer (REST) is a software architectural style for distributed
system that defines a set of constraints to be used for creating Web based services. It is
mean to provide interoperability between the systems based on services running on the
Internet. REST is defined by Roy Fielding (author of HTTP specifications) in his PhD
dissertation on "Architectural Styles and the Design of Network-based Software
Architectures". Today, it is being used by many of IT enterprises including Yahoo,
Google, Amazon, IBM as well as social networking sites such as Twitter, Facebook, and
LinkedIn etc.
The web services that follow the REST architectural style are called RESTful Web
services. The RESTful web services allow the requesting systems to access and
manipulate textual representations of web resources by using a uniform and predefined
set of stateless operations. The generalized interaction in REST with HTTP specification is
shown in Fig. 2.2.1.
®
Fig. 2.2.1 Interaction in REST with HTTP specification

The REST architectural style has four basic principles which are explained as follows :
a) Resource identification
 In RESTful web services, the set of resources are often exposed by the publishers
over the internet which are accessed by the clients through interaction mechanisms.
The key component for information abstraction in REST is a resource. A resource
can be any information stored in a document, image or temporal storage which uses
conceptual mapping to a set of entities. Each resource in a REST has a unique name
identified by a Uniform Resource Identifier (URI) similar to URL on web. The URI is
utilized for giving a global addressing space tending to resources which are
involved in an interaction between components and facilitates service discovery.
The URIs can be bookmarked or traded through a hyperlink which gives greater
readability.
b) Controlled Interfaces
 In RESTful web services, the interaction is happened through client/server
protocols based on HTTP standards. The primitives used to perform manipulation
are fixed set of four CRUD (Create, Read, Update, Delete) operations which are
implemented using HTTPs PUT, GET, POST and DELETE methods. The operations
of REST methods are given in Table 2.2.1.
Method Operation
PUT Create a new resource

GET Retrieve the current state of resource
POST Update or transfers a new state to a resource
DELETE Delete or destroy a resource
Table 2.2.1 REST Methods
®
c) Self-Descriptive Messages
 A REST message contains brief description about message communication along
with the processing information. It enables intermediate users to process the
message without parsing the contents. The REST decouples the resources from their
representations such that their content can be accessed in a variety of standard
formats like HTML, XML, etc. It also provides the alternate representations of each
resource in multiple formats. The message also contains metadata that can be used
for detecting the transmission error, caching control, authentication, authorization,
and access control.
d) Stateless Communications
 In REST, the communication happens are mostly ‘stateless’ where messages do not
have to rely on the state of the conversation. The stateless communication facilitates
improved visibility, task of recovering from partial failures, and increased
scalability. The limitations of stateless communication are degraded or decreased
network performance because of collective repeated data. However, there are some
communications happened using Stateful interactions which performs explicit state
transfer such as URI rewriting, hidden form fields or cookies. To point the future
state of communication, the current state can be embedded in a response message.
Mostly the stateless RESTful web services are scalable in nature as they can serve
very large number of clients with supporting caching mechanisms, clustering, and
load balancing.
 The common example of REST web service is Amazon AWS which uses various
REST methods in its Simple Storage Service (S3). The Simple Storage Service uses
bucket as a medium for storing the objects also called items. For manipulating the
bucket, it makes HTTP requests to create, fetch, and delete buckets using PUT, GET,
POST and DELETE methods.
 The RESTful web services are mainly used in web 2.0 applications where the
mashup allows to combine the capabilities of one web application into another, for
example, taking the videos from online YouTube repository and put into a
Facebook page.
2.3 Web Services

With the SOA perspectives, software abilities are delivered and expended by means of
loosely coupled, reusable, coarse-grained, discoverable, and independent services
associating through a message-based communication model. The web has grown to be a
®
medium for associating remote clients with applications for quite a long time, and more
recently, coordinating applications over the Internet has gained in popularity. The term
"web service" is frequently alluded to an independent, self-describing, modular
application intended to be utilized and accessible by other software applications over the
web. In general, Web services are loosely coupled (platform independent), contracted
components (behavior, input and output parameters, binding specifications are public)
that communicate in XML-based (open standard) interfaces. When a web service is
deployed, different applications and other web services can find and invoke the deployed
service. The functionality of web services is shown in Fig. 2.3.1.
Fig. 2.3.1 Functionality of Web services
In web services, service provider is responsible for developing and publishing the
various services into UDDI (Universal Description Discovery and Integration) registry
which can be accessed by different Service Consumers. When any consumer wants to
invoke a service, they have to make a query for finding the reference of service into UDDI
registry. If reference of service is available which is registered by service provider, then
service is bind to the consumer who has invoked it. During this phase consumer can get
access to WSDL (Web Service Description Language) document which has description
about the services published by provider. After binding the service, consumer can send
call to the method with parameters using SOAP request message and provider sends
result using SOAP response message.
®
As web service is one of the most widely recognized examples of a SOA

implementation. The W3C defined a web service as a software framework intended to
support interoperable machine-to-machine collaboration over a network. The web service
has an interface described in a machine-executable format explicitly in Web Services
Description Language or WSDL. The important components of Web services are
explained as following.
a) Simple Object Access Protocol (SOAP)

The SOAP is the basic XML based communication
protocol used by Service provider and consumer during
the invocation process. It is an XML specification for
transmitting data (protocol) to and from a Web service. It
gives a standard bundling structure for transmission of
XML records over different Internet protocols, for
example, HTTP, SMTP, and FTP. Because of standardize
messaging formats the heterogeneous middleware
frameworks can accomplish interoperability. A SOAP
message comprises of a root element called envelope,
which contains a header and body.
The SOAP header has attributes of the message used Fig. 2.3.2 Structure of SOAP
message
in processing the message. It is an optional element. The
SOAP body contains XML data comprising the message being sent and it’s a mandatory
element in SOAP message. The SOAP header also provides extra application-level
components for authentication, routing information, message parsing instructions,
transaction management, and Quality of Service (QoS) configurations. The messages are
marshalled by SOAP engine at provider’s side and unmarshalled at receiver side based
on the XML schema that describes the structure of the SOAP message.
The structure of SOAP message is shown in Fig. 2.3.2.
In SOAP Message, SOAP envelope is the toot element that contains the header and
body. The SOAP Header is an optional component used to pass application related
information that is to be processed by SOAP node along the message path. The SOAP
body is a mandatory component that contains information intended for the recipient.
Fault is a special block within the body that indicates protocol-level errors.
®
b) Web Services Description Language (WSDL)

The WSDL is an XML based document which
describes the interfaces and set of operations
supported by a web service in a standardize format. It
is used for standardizing the representation of input
and output parameters along with its operations. It is
an XML document used for describing web services.
The WSDL document contains information on data
types to be used, messages to be exchanged, operations
performed by the web service and communication
protocol to be followed.
It also performs service’s protocol binding, and
describes the way in which the messages can be
transferred on the wire from one end to the other. The
WSDL defines the way in which clients can interact
with a web service. A generalized WSDL document
structure is shown in Fig. 2.3.3.
In WSDL document, Types represents a container Fig. 2.3.3 WSDL document

structure
for abstract type definitions defined using XML
Schema. A Message represents definition of an abstract message that may consist of
multiple parts; each part may be of a different type. The portType is an abstract set of
operations (which are input and output operations) supported by one or more endpoints
(commonly known as an interface).The operations supported by are portType defined by
an exchange of messages. The Binding is a concrete protocol and data format specification
for a particular portType and Service represents the collection of related endpoints, where
an endpoint is defined as a combination of a binding and an address (URI). The first three
elements (types, message, and portType) are all abstract definitions of the web service
interface and last two elements (binding and service) describe the concrete details of how
the abstract interface maps to messages on the wire.
c) Universal Description, Discovery, and Integration (UDDI)

The UDDI is a registry used for publishing the web services by provider and
discovering them consumers. The consumer can search the specific web service by its
names, identifiers, categories, or the specification implemented by the web service
provider. It provides a set of rules for registering and retrieving information about a
®
Cloud Computing 2 - 10 Cloud Enabling Technologies
business process and its services. The three basic functions of UDDI are Publish service
which shows how to register a web service, Find service which shows how a client finds a
web service and Bind service which shows how the client connects and interacts with a
web service. A UDDI registry is made up of XML-based service descriptors. Each service
descriptor contains the information needed to find and then bind to a particular web
service.
The SOAP is an extension, and an evolved version of XML-RPC. It uses remote
procedure call protocol with XML for encoding its calls and HTTP as a transport
mechanism. In XML-RPC, a call to the procedure is made by client and executed on the
server. The resultant value returned by sever is formatted in XML.
As XML-RPC was not completely lined up with the most recent XML standardization
hence, it didn't permit developers to expand the request or response format of an XML-
RPC call. The SOAP primarily portrays the conventions between associating, gatherings
and leaves the data format of exchanging messages to XML schema. The significant
contrast between web service and other technologies like CORBA, J2EE, and CGI
scripting is its standardization, since it depends on standardize XML and giving a
language independent representation of data. Most web services transmit messages over
HTTP, making them accessible as internet-scale applications. In RESTful web services, the
interaction can be either synchronous or asynchronous for making them reasonable for
both request and response with single direction exchange patterns.
2.3.1 Web Services Protocol Stack

A web service protocol stack is a list of protocols that are utilized to define, find,
execute, and make Web services collaborate with one another. The web services protocol
stack not only covers the specifications of RESTful web services but also a SOAP-based
web services. This specification defines QoS properties along with different nonfunctional
requirements to guarantee a level of quality in message communication as well as reliable
transactional policies. The different components of WS protocol stack are categorized into
six layers as shown in Fig. 2.3.4.
®
Fig. 2.3.4 Web services Protocol Stack
Each layer in a WS protocol stack provides a set of standards and protocols for
successful working of Web services. The bottommost and first layer in protocol stack is
Transport Layer which is responsible for transporting a message between applications. It
supports different protocols based on the type of application like HTTP, Simple Mail
Transfer Protocol (SMTP), Java Messaging Services (JMS), Internet Interoperable Protocol
(IIOP) in CORBA etc.
The second layer in protocol stack is Messaging layer which is required for encoding
in transit messages in XML or other formats that are understood by both client and
server. This layer provides various protocols like SOAP, WS-Coordination, WS-
Transaction and WS-addressing for web services. The SOAP uses XML based request and
response messages to communicate between two parties. WS-Coordination provides
protocols that can coordinate the actions of distributed applications. It facilitates
transaction processing, workflow management, and other systems for coordination to
hide their proprietary protocols and to operate in a heterogeneous environment. WS-
Transaction specification describes the coordination types that are used with the
extensible coordination framework and perform transactions. WS-Transaction work on
WS- Coordination protocol whose communication patterns are asynchronous by default.
It defines two coordination types : Atomic Transaction (AT) for individual operations and
®
Business Activity (BA) for long running transactions. WS-addressing provides transport-
neutral mechanisms to address Web services and messages. It provides specification of
transport-neutral mechanism that allows web services to communicate addressing
information. It also gives interoperable constructs that convey information provided by
transport protocols and messaging systems.
The third layer in WS protocol stack is a Service Description layer which is used for
describing the public interface to a specific web service. It composed of four specifications
like WSDL, WS-Resource_Properties, WS-Policy and WS-Service_ Group.
The WSDL which describes the services used by provider and used by recipient. The
WS-Resource_Properties provide a set of properties associated with web resources. It also
describes an interface to associate a set of typed values with a WS-Resource. The WS-
Policy allows web services to use XML to advertise their policies used by consumers. It
also represents a set of specifications that describe the capabilities and constraints of the
security policies on intermediaries and end points and WS-Service_Group describes an
interface for operating on collections of WS-Resources.
The fourth layer is Service Discovery layer that uses UDDI registry to register or
publish a web service written by provider and discover by consumer for the invocation. It
centralizes web services into a common registry so that web service provider can publish
their services with location and description, and makes it easy for consumer to discover
them that are available on the network.
The fifth layer in protocol stack is QoS (Quality of Service) layer. It has three
specifications namely WS-Reliable_Messaging, WS-Security and WS-Resource_Validity.
The WS-Reliable_Messaging describes a protocol that allows SOAP messages to be
reliably delivered between distributed applications. The WS-Security provides a
specification that defines how security measures are implemented in web services to
protect them from external attacks and WS-Resource_Lifetime describes an interface to
manage the lifetime of a WS-Resource.
The sixth layer of protocol stack is a Composition layer which is used for composition
of business processes. It has two components namely BPEL4WS (Business Process
Execution Language for Web Service) and WS-Notification. The Business Process
Execution Language (BPEL) is a specification for describing business processes in a
portable XML format. BPEL4WS is a standard executable language for specifying
interactions between web services recommended by OASIS, where web services can be
composed together to make more complex web services and workflows. The goal of
BPEL4WS is to complete the business transaction, or fulfillment of the job of a service. The
®
WS-Notification is used to standardize the terminology, concepts, and operations needed

to express the basic roles involved in Web services publish and subscribe for notification
message exchange.
2.4 Publish-Subscribe Model

 The “Publish-Subscribe Model” describes a specific model for connecting source
and destination for a message transport. It is a design pattern that enables
asynchronous interaction among distributed applications. In this, the producer or
publisher of the message (distributer) label the message in some style; often this is
done by associating at least one or more topic names from a (controlled)
vocabulary. At that point the receivers of the message (subscriber) will indicate the
topics for which they wish to receive related messages. On the other hand, one can
utilize content-based delivery system where the content is queried in some format.
The utilization of topic or content-based message selection is named as message
filtering. Note that in every case, we locate a many-to-many relationship between
publishers and subscribers.
 In certain cases, there is a many-to-many relationship between event publishers and
event subscribers because of multiple publishers/subscribers are arises for any type
of event which varies dynamically. The Publish-subscribe model works very well
with the databases as it adds dynamicity to static the nature of databases. The
publish-subscribe mechanism can be either centralized or distributed. In centralized
publish-subscribe mechanism, a centralized server act as a mediator for
transmitting messages between publisher and subscriber. As centralized server may
leads may leads to single point of failure, nowadays distributed publish-subscribe
mechanism becomes very popular. In distributed publish-subscribe mechanism, the
publishers and subscribers are naturally decoupled with each other which makes
publishers unconcerned with the potential consumers and their data, and the
subscribers are unconcerned with the locations of the potential consumers and
producers of data.
 The Publish-subscribe systems are classified into two types namely topic-based
Publish-subscribe systems and content-based Publish-subscribe systems. In topic-
based systems, the publishers are responsible for generating events with respect to a
topic or subject. The Subscribers basically specify their interest in a particular topic,
and receive all events published on that topic.
 For subscribers, the event definition based on topic names is inflexible therefore
they filter the events belonging to general topics. The Content-based systems solve
®
this problem of event definition by introducing a subscription scheme based on the

contents of events. The Content-based systems are preferable as they give users the
ability to express their interest by specifying predicates over the values of a number
of well-defined attributes. The matching of publications (events) to subscriptions
(interest) is done based on the content. Distributed solutions are basically centered
around topic-based publish-subscribe systems. Since most of the database
technologies uses publish-subscribe systems for managing in transit data transfer
messages easily and efficiency.
 The high-level applications interact and regularly query to the database in order to
adapt their execution. In that, periodic data polling is not only inefficient and
unscalable. Therefore, publish subscribe mechanism can be used there to solves
many issues associated with database and application interaction. In a publish-
subscribe interaction, event subscribers register to particular event types and
receive notifications from the event publishers when they generate such events.
 In a publish-subscribe interaction, event subscribers register to specific event types
and get notifications from the event publishers when they create such events.
Fig. 2.4.1 Publish subscribe model for oracle database

 A database company, Oracle have introduced a publish subscribe solution for
enterprise information delivery and messaging for their database. It uses Advanced
Queuing mechanism which is fully integrated in the Oracle database to publish data
changes and to automate business process workflows for distributed applications.
The publish subscribe model for oracle database is shown in Fig. 2.4.1.
 The publish-subscribe model for oracle database allows messaging between
applications. The messages generated between publishers and subscribers are
streamed by advanced queuing engine that facilitate messages to be stored
persistently, propagate between queues on different databases, and transmitted
®
using Oracle Net Services. It provides features like rule-based subscription, message
broadcast, message listen, message notification, and high availability (HA),
scalability, and reliability to the application, queuing system and database.
2.5 Basics of Virtualization

As we know that the large amounts of compute, storage, and networking resources are
needed to build a cluster, grid or cloud solution. These resources need to be aggregated at
one place to offer a single system image. Therefore, the concept of virtualization comes
into the picture where resources can be aggregated together to fulfill the request for
resource provisioning with rapid speed as a single system image. The virtualization is a
novel solution that can offer application inflexibility, software manageability, optimum
resource utilization and security concerns in existing physical machines. In particular,
every cloud solution has to rely on virtualization solution for provisioning the resources
dynamically. Therefore, Virtualization technology is one of the fundamental components
of cloud computing. It provides secure, customizable, and isolated execution
environment for running applications on abstracted hardware. It is mainly used for
providing different computing environments. Although these computing environments
are virtual but appear like to be physical.
Fig. 2.5.1 Capability of Server with and without Virtualization
®
The term Virtualization is nothing but creation of a virtual version of hardware

platform, operating system, storage or network resources rather than actual. It allows to
run multiple operating systems on a single physical machine called host machine. Each
instance of operating system called Virtual machine (VM) and operating system runs
inside virtual machine is called guest operating system. The capability of single servers
with and without virtualization is shown in Fig. 2.5.1.
Previously, industries used to keep the separate physical servers for file storage, database,
web hosting, email etc. in their server rooms. Each server was required separate
hardware, operating system, application software and administrators to manage it. Any
failure in the server hardware may cause indefinite blocking of the services till it restored
and whole system may collapse.
Fig. 2.5.2 Traditional Servers Vs Virtualized Servers
®
Therefore, in search of consolidation solution the concept of virtualization came in to

picture. As virtualization solution allows to run multiple servers operating systems in a
single physical machine. It greatly saves the cost behind purchasing extra physical
servers, power consumption, manpower; licensing etc. It also reduces the number of
physical servers required for the deployment of applications shown in Fig. 2.5.2.
The different operations supported by virtualized environment are allowing users to
create, delete, copy, migrate, snapshot, template, save the state of the VM or roll back the
execution of VM. The purpose of virtualization is to enhance resource sharing among
multiple users, to improve computing performance in terms of maximum resource
utilization and application flexibility. To implement virtualization, specialized software
is required called Virtual Machine Manager (VMM) or Hypervisor. A VMM is a piece of
software that allows creating, running and managing the multiple instances of operating
systems (called Virtual Machines) over the shared hardware of host machine. A VMM
runs one or more virtual machines on a physical machine called as host machine which
can be any computer or a server.
The operating system running inside virtual machine called Guest Operating System
(Guest OS). Each virtual machine shares hardware resources of host machine (including
CPU, Memory, Storage, IO, and Network) to runs independent virtual operating systems.
A server running virtualization is shown in Fig. 2.5.3.
Fig. 2.5.3 Single server running Virtualization
®
2.5.1 Characteristics of Virtualization

The Virtualization allows organizations to use different computing services based on
aggregation. The different characteristics of virtualization are explained as follows.
1) Maximum resource utilization

The virtualization is intended to run multiple Guest OS over a single physical machine
which fully utilizes the resources. It does not keep CPU and other resources idle because
they will be shared among multiple virtual machines.
2) Reduces Hardware Cost

As software abstraction layer provided by virtualization integrates multiple servers
into one or few, it ultimately saves the hardware cost. Because of virtualization
organizations need not have to setup and maintain huge infrastructure. It minimizes
purchasing and installing large number of servers for every application.
3) Minimize the maintenance cost

Due to the limited number of physical servers used in virtualization, the organizations
can spend a minimum cost for maintaining fewer servers rather than large and also needs
less manpower to maintain those.
4) Supports Dynamic Load balancing

The load balancing is required for optimum resources utilization and for faster
execution of complex jobs. It ensures that each machine in a network should have equal
amount of work load.
Fig. 2.5.4 Dynamic Load balancing
®
The load balancing is done by distributing the workload of heavy loaded machine in
to other lightly loaded machines. By default, virtualization supports dynamic load
balancing which is shown in Fig. 2.5.4.
5) Server Consolidation
The server consolidation in virtualization means aggregating the multiple servers and
their applications into a single machine which were required to have many physical
computers with unique operating system. It allows multiple servers to be consolidated
into a single server which makes optimum resource utilization of a server. It is capable to
run legacy software applications with old OS configuration and the new applications
running latest OS together inside VMs. The concept of server consolidation is shown in
Fig. 2.5.5.
Fig. 2.5.5 Server Consolidation
6) Disaster recovery
Disaster recovery is a critical component for IT organizations. It is required when
system crashes due to the natural disasters like flood, earthquake etc. As sometime
mission critical or business critical applications run inside the virtual machines, it can
create huge business/economic losses. Therefore, to take care of that virtualization
technology provides built-in disaster recovery feature that enables control on a virtual
®
machine image on a machine to be instantly diverted, migrated or re-imaged on another

server if failure occurs.
7) Easy VM management
The VMs running on one machine can be easily managed by copying, migrating,
templating or snapshotting on to another machine for backup. They can be easily
migrated in case of maintenance or can be deleted if they are not in use.
8) Maintaining Legacy applications and can test Beta Softwares

As virtualization can run multiple operating systems side by side, it allows users to
run their legacy applications on supported OS. It can be used to run new release of
Softwares (called Beta Softwares) without requiring them a separate dedicated machine
for testing.
9) Sandboxing
Virtual machines are useful to provide secure, isolated environments (sandboxes) for
running foreign or less-trusted applications. Virtualization technology can, thus, help
build secure computing platforms.
10) Virtual hardware

It can provide the hardware solutions that never had like a Virtual storage, Virtual
SCSI drives, Virtual Ethernet adapters, virtual Ethernet switches and hubs, and so on.
2.5.2 Pros and Cons of Virtualization

We have been highlighting the benefits of virtualization in terms of their technical
advantages. In addition to technical advantages, virtualization also offers the potential to
reduce capital expenditures. For instance, at a given point of time, only 10 % of the
average server is used. Most of the time, these servers are idle. When an organization
makes use of virtualization, the utilization can be as high as 80 %. A lot of computing
resources invested by the organization in the former case, without virtualization, do not
provide any benefit.
Thus, we can say that virtualization has certain pros that help the organization to
achieve efficiency (Some of the pros are covered in characteristics of virtualization).
The pros are as follows :
 Cost Reduction : Multiple OS and applications can be supported on a single
physical system, eliminating the need for purchase of additional servers for each OS or
application.
®
 Efficient resource utilization : Virtualization will isolate virtual machines from

each other and from the physical hardware; hence, utilization of the resource will be
optimized.
 Optimization : Along with physical servers, all the other resources, such as storage,
memory, etc., are also optimized for virtualization.
 Increased Return on Investment : In a traditional computing environment, most
resources remain unutilized and servers remain underutilized. But, with virtualization,
you can maximize resource utilization and reduce the amount of physical resources
deployed to maintain and administer these resources, which in turn leads to greater
profits.
 Budgeting : Virtualization enables flexible IT budgeting for an organization. This is
because most of the tasks, such as administration, maintenance, and management are
direct costs.
 Increased Flexibility : With virtualization you can run almost any application on
your system. This is because virtualization makes it possible to run multiple operating
systems and hardware configurations simultaneously on a single host.
However, there are also certain cons of virtualization, which are as follows :
 Upfront Investments : Organizations need to acquire resources beforehand to
implement Virtualization. Also, there might occur a need to incur additional resources
with time.
 Performance Issues : Although virtualization is an efficient technique and
efficiency can be increased by applying some techniques, there may be chances when the
efficiency is not as good as that of the actual physical systems.
 Licensing Issues : All software may not be supported on virtual platforms.
Although vendors are becoming aware of the increasing popularity of virtualization and
have started providing licenses for software to run on these platforms, the problem has
not completely vanished. Therefore, it is advised to check the licenses with the vendor
before using the software.
 Difficulty in Root Cause Analysis : With the addition of an additional layer in
virtualization, complexity gets increased. This increased complexity makes root cause
analysis difficult in case of unidentified problems.
2.6 Types of Virtualization

Based on the functionality of virtualized applications, there are five basic types of
virtualization which are explained as follows.
®
2.6.1 Desktop Virtualization

The processing of multiple virtual desktops occurs on one or a few physical servers,
typically at the centralized data center. The copy of the OS and applications that each end
user utilizes will typically be cached in memory as one image on the physical server.
The Desktop virtualization provides a virtual desktop environment where client can
access the system resources remotely through the network. The ultimate goal of desktop
virtualization is to make computer operating system accessible from anywhere over the
network. The virtual desktop environments do not require a specific system or hardware
resources on the client side; however, it requires just a network connection. The user can
utilize the customized and personalized desktop from a remote area through the network
connection. The virtualization of the desktop is sometimes referred as Virtual Desktop
Infrastructure (VDI) where all the operating systems like windows, or Linux are installed
as a virtual machine on a physical server at one place and deliver them remotely through
the Remote Desktop Protocols like RDP (in windows) or VNC (in Linux). The processing
of multiple virtual desktops occurs on one or more physical servers placed commonly at
the centralized data center. The copy of the OS and applications that each end client uses
will commonly be stored in memory as one image the physical server.
Currently, VMware Horizon and Citrix Xen Desktop are the two most popular VDI
solutions available in the market with so many dominating features. Although, Desktop
operating system provided by VDI is virtual but appears like a physical desktop
operating system. The virtual desktop can run all the types of applications that are
supported on physical computer but only difference is they are delivered through the
network.
Some of the benefits provided by Desktop virtualization are :
 It provides easier management of devices and operating systems due to centralized
management.
 It reduces capital expenditure and maintenance cost of hardware due to
consolidation of multiple operating systems into a single physical server,
 It provides enhance security as confidential data is stored in data center instead of
personal devices that could easily be lost, stolen or tampered with.
 With Desktop virtualization, operating systems can be quickly and easily
provisioned for the new users without doing any manual setup.
 Upgradation of operating system is easier
 It can facilitate Work from Home feature for IT Employees due to the desktop
operating system delivery over the internet.
®
2.6.2 Application Virtualization

Application virtualization is a technology that encapsulates an application from the
underlying operating system on which it is executed. It enables access to an application
without needing to install it on the local or target device. From the user’s perspective, the
application works and interacts like it’s native on the device. It allows to use any cloud
client which supports BYOD like Thin client, Thick client, Mobile client, PDA and so on.
Application virtualization utilizes software to bundle an application into an executable
and run anywhere type of application. The software application is isolated from the
operating system and runs in an environment called as "sandbox”. There are two types of
application virtualization: remote and streaming of the application. In first type, the
remote application will run on a server, and the client utilizes some kind of remote
display protocol to communicate back. For large number of administrators and users, it’s
fairly simple to set up remote display protocol for applications. In second type, the
streaming application will run one copy of the application on the server, and afterward
have client desktops access and run the streaming application locally. With streaming
application, the upgrade process is simpler, since you simply set up another streaming
application with the upgrade version and have the end users point to the new form of the
application. Some of the popular application virtualization softwares in the commercial
center are VMware ThinApp, Citrix XenApp, Novell ZENworks Application
Virtualization and so on.
Some of the prominent benefits of application virtualization are
 It allows for cross-platform operations like running Windows applications on Linux
or android and vice versa.
 It allows to run applications that have legacy issues like supported on older
Operating systems.
 It avoids conflict between the other virtualized applications
 It allows a user to run more than one instance of an application at same time
 It reduces system integration and administration costs by maintaining a common
software baseline across multiple diverse computers in an organization.
 It allows to run incompatible applications side by side, at the same time
 It utilizes less resource than a separate virtual machine.
 It provides greater security because of isolating environment between applications
and operating system.
®
2.6.3 Server Virtualization

A Server virtualization is the process of dividing a physical server into multiple
unique and isolated virtual servers by means of software. It partitions a single physical
server into the multiple virtual servers; each virtual server can run its own operating
system and applications independently. The virtual server is also termed as virtual
machine. The consolidation helps in running many virtual machines under a single
physical server. Each virtual machine shares the hardware resources from physical server
that leads to better utilization of the physical servers’ resources. The resources utilized by
virtual machine include CPU, memory, storage, and networking. The hypervisor is the
operating system or software that runs on the physical machine to perform server
virtualization. The hypervisor running on physical server is responsible for providing the
resources to the virtual machines. Each virtual machine runs independently of the other
virtual machines on the same box with different operating systems that are isolated from
each other.
The popular server virtualization softwares are VMware’s vSphere, Citrix Xen Server,
Microsoft’s Hyper-V, and Red Hat’s Enterprise Virtualization.
The benefits of server virtualization are
 It gives quick deployment and provisioning of virtual operating system.
 It has reduced the capital expenditure due to consolidation of multiple servers into
a single physical server which eliminate the cost of multiple physical hardware.
 It provides ease in development & testing.
 It makes optimum resource utilization of physical server.
 It provides centralize the server administration and disaster recovery feature.
 It reduces cost because less hardware is required.
2.6.4 Storage Virtualization

Storage virtualization is the process of grouping multiple physical storages using
software to appear as a single storage device in a virtual form. It pools the physical
storage from different network storage devices and makes it appear to be a single storage
unit that is handled from a single console. Storage virtualization helps to address the
storage and data management issues by facilitating easy backup, archiving and recovery
tasks in less time. It aggregates the functions and hides the actual complexity of the
storage area network. The storage virtualization can be implemented with data storage
technologies like snapshots and RAID that take physical disks and present them in a
virtual format. These features can allow to perform redundancy to the storage and gives
®
optimum performance by presenting host as a volume. Virtualizing storage separates the

storage management software from the underlying hardware infrastructure in order to
provide more flexibility and scalable pools of storage resources. The benefits provided by
storage virtualization are
 Automated management of storage mediums with estimated of down time.
 Enhanced storage management in heterogeneous IT environment.
 Better storage availability and optimum storage utilization.
 It gives scalability and redundancy in storage.
 It provides consummate features like disaster recovery, high availability,
consistency, replication & re-duplication of data.
 The backup and recovery are very easier and efficient in storage virtualization.
2.6.5 Network Virtualization

The Network virtualization is the ability to create virtual networks that are decoupled
from the underlying network hardware. This ensures the network can better integrate
with and support increasingly virtual environments. It has capability to combine multiple
physical networks into one virtual, or it can divide one physical network into separate,
independent virtual networks.
The Network virtualization is the ability to make virtual networks that are decoupled
from the underlying network hardware. This ensures the network can better integrate
with and support increasingly virtual environments. It has capacity to combine multiple
physical networks into single virtual, or it can divide one physical network into separate,
independent virtual networks.
The Network virtualization can combine the entire network into a single mode and
allocates its bandwidth, channels, and other resources based on its workload. Network
virtualization is similar to server virtualization but instead of dividing up a physical
server among several virtual machines, physical network resources are divided up among
multiple virtual networks. Network virtualization uses specialized software to perform
network functionality by decoupling the virtual networks from the underlying network
hardware. Once network virtualization is established, the physical network is only used
for packet forwarding and network management is done using the virtual or software-
based switches. The VMware’s NSX platform is the popular example of network
virtualization which decouples network services from the underlying hardware and
allows virtual provisioning of an entire network. The physical network resources, such as
®
switches and routers, are pooled and accessible by any user via a centralized management
system. The benefits of network virtualization are
 It consolidates the physical hardware of a network into a single virtual network that
reduce the management overhead of network resources.
 It gives better scalability and flexibility in network operations.
 It provides automated provisioning and management of network resources.
 It reduces the hardware requirements and will have a corresponding impact on
your power consumption.
 It is cost effective as it requires reduced the number of physical devices.
2.7 Implementation Levels of Virtualization

The virtualization is implemented at various levels by creating a software abstraction
layer between host OS and Guest OS. The main function of software layer is to virtualize
physical hardware of host machine in to virtual resources used by VMs by using various
operational layers. The different levels at which the virtualization can be implemented is
Fig. 2.7.1 Implementation Levels of Virtualization
There are five implementation levels of virtualization, that are Instruction Set
Architecture (ISA) level, Hardware level, Operating System level, Library support level
and Application level which are explained as follows.
®
1) Instruction Set Architecture Level

 Virtualization at the instruction set architecture level is implemented by emulating
an instruction set architecture completely on software stack. An emulator tries to
execute instructions issued by the guest machine (the virtual machine that is being
emulated) by translating them to a set of native instructions and then executing
them on the available hardware.
 That is emulator works by translating instructions from the guest platform to
instructions of the host platform. These instructions would include both processor
oriented (add, sub, jump etc.), and the I/O specific (IN/OUT) instructions for the
devices. Although this virtual machine architecture works fine in terms of
simplicity and robustness, it has its own pros and cons.
 The advantages of ISA are, it provides ease of implementation while dealing with
multiple platforms and it can easily provide infrastructure through which one can
create virtual machines based on x86 platforms such as Sparc and Alpha. The
disadvantage of ISA is since every instruction issued by the emulated computer
needs to be interpreted in software first which degrades the performance.
 The popular emulators of ISA level virtualization are :
a) Boochs
It is a highly portable emulator that can be run on most popular platforms that include
x86, PowerPC, Alpha, Sun, and MIPS. It can be compiled to emulate most of the versions
of x86 machines including 386, 486, Pentium, Pentium Pro or AMD64 CPU, including
optional MMX, SSE, SSE2, and 3DNow instructions.
b) QEMU
QEMU (Quick Emulator) is a fast processor emulator that uses a portable dynamic
translator. It supports two operating modes: user space only, and full system emulation.
In the earlier mode, QEMU can launch Linux processes compiled for one CPU on another
CPU, or for cross-compilation and cross-debugging. In the later mode, it can emulate a
full system that includes a processor and several peripheral devices. It supports
emulation of a number of processor architectures that includes x86, ARM, PowerPC, and
Sparc.
c) Crusoe
The Crusoe processor comes with a dynamic x86 emulator, called code morphing
engine that can execute any x 86 based application on top of it. The Crusoe is designed to
®
handle the x86 ISA’s precise exception semantics without constraining speculative
scheduling. This is accomplished by shadowing all registers holding the x86 state.
d) BIRD
BIRD is an interpretation engine for x86 binaries that currently supports only x86 as
the host ISA and aims to extend for other architectures as well. It exploits the similarity
between the architectures and tries to execute as many instructions as possible on the
native hardware. All other instructions are supported through software emulation.
2) Hardware Abstraction Layer

 Virtualization at the Hardware Abstraction Layer (HAL) exploits the similarity in
architectures of the guest and host platforms to cut down the interpretation latency.
The time spent in instruction interpretation of guest platform to host platform is
reduced by taking the similarities exist between them Virtualization technique helps
map the virtual resources to physical resources and use the native hardware for
computations in the virtual machine. This approach generates a virtual hardware
environment which virtualizes the computer resources like CPU, Memory and IO
devices.
 For the successful working of HAL the VM must be able to trap every privileged
instruction execution and pass it to the underlying VMM, because multiple VMs
running own OS might issue privileged instructions need full attention of CPU’s .If
it is not managed properly then VM may issues trap rather than generating an
exception that makes crashing of instruction is sent to the VMM. However, the most
popular platform, x86, is not fully-virtualizable, because it is been observed that
certain privileged instructions fail silently rather than trapped when executed with
insufficient privileges. Some of the popular HAL virtualization tools are
a) VMware
The VMware products are targeted towards x86-based workstations and servers. Thus,
it has to deal with the complications that arise as x86 is not a fully-virtualizable
architecture. The VMware deals with this problem by using a patent-pending technology
that dynamically rewrites portions of the hosted machine code to insert traps wherever
VMM intervention is required. Although it solves the problem, it adds some overhead
due to the translation and execution costs. VMware tries to reduce the cost by caching the
results and reusing them wherever possible. Nevertheless, it again adds some caching
cost that is hard to avoid.
®
b) Virtual PC
The Microsoft Virtual PC is based on the Virtual Machine Monitor (VMM) architecture
that lets user to create and configure one or more virtual machines. It provides most of
the functions same as VMware but additional functions include undo disk operation that
lets the user easily undo some previous operations on the hard disks of a VM. This
enables easy data recovery and might come handy in several circumstances.
c) Denali
The Denali project was developed at University of Washington’s to address this issue
related to scalability of VMs. They come up with a new virtualization architecture also
called Para virtualization to support thousands of simultaneous machines, which they call
Lightweight Virtual Machines. It tries to increase the scalability and performance of the
Virtual Machines without too much of implementation complexity.
3) Operating System Level Virtualization

 The operating system level virtualization is an abstraction layer between OS and
user applications. It supports multiple Operating Systems and applications to be
run simultaneously without required to reboot or dual boot. The degree of isolation
of each OS is very high and can be implemented at low risk with easy maintenance.
The implementation of operating system level virtualization includes, operating
system installation, application suites installation, network setup, and so on.
Therefore, if the required OS is same as the one on the physical machine then the
user basically ends up with duplication of most of the efforts, he/she has already
invested in setting up the physical machine. To run applications properly the
operating system keeps the application specific data structure, user level libraries,
environmental settings and other requisites separately.
 The key idea behind all the OS-level virtualization techniques is virtualization layer
above the OS produces a partition per virtual machine on demand that is a replica
of the operating environment on the physical machine. With a careful partitioning
and multiplexing technique, each VM can be able to export a full operating
environment and fairly isolated from one another and from the underlying physical
machine.
 The popular OS level virtualization tools are
a) Jail
The Jail is a FreeBSD based virtualization software that provides the ability to partition
an operating system environment, while maintaining the simplicity of UNIX ”root”
®
model. The environments captured within a jail are typical system resources and data
structures such as processes, file system, network resources, etc. A process in a partition is
referred to as “in jail” process. When the system is booted up after a fresh install, no
processes will be in jail. When a process is placed in a jail, all of its descendants after the
jail creation, along with itself, remain within the jail. A process may not belong to more
than one jail. Jails are created by a privileged process when it invokes a special system call
jail. Every call to jail creates a new jail; the only way for a new process to enter the jail is
by inheriting access to the jail from another process that already in that jail.
b) Ensim
The Ensim virtualizes a server’s native operating system so that it can be partitioned
into isolated computing environments called virtual private servers. These virtual private
servers operate independently of each other, just like a dedicated server. It is commonly
used in creating hosting environment to allocate hardware resources among large
number of distributed users.
4) Library Level Virtualization

Most of the system uses extensive set of Application Programmer Interfaces (APIs)
instead of legacy System calls to implement various libraries at user level. Such APIs are
designed to hide the operating system related details to keep it simpler for normal
programmers. In this technique, the virtual environment is created above OS layer and is
mostly used to implement different Application Binary Interface (ABI) and Application
Programming Interface (API) using the underlying system.
The example of Library Level Virtualization is WINE. The Wine is an implementation
of the Windows API, and can be used as a library to port Windows applications to UNIX.
It is a virtualization layer on top of X and UNIX to export the Windows API/ABI which
allows to run Windows binaries on top of it.
5) Application Level Virtualization

In this abstraction technique the operating systems and user-level programs executes
like applications for the machine. Therefore, specialize instructions are needed for
hardware manipulations like I/O mapped (manipulating the I/O) and Memory mapped
(that is mapping a chunk of memory to the I/O and then manipulating the memory). The
group of such special instructions constitutes the application called Application level
Virtualization. The Java Virtual Machine (JVM) is the popular example of application
level virtualization which allows creating a virtual machine at the application-level than
OS level. It supports a new self-defined set of instructions called java byte codes for JVM.
®
Such VMs pose little security threat to the system while letting the user to play with it like
physical machines. Like physical machine it has to provide an operating environment to
its applications either by hosting a commercial operating system, or by coming up with its
own environment.
The comparison between different levels of virtualization is shown in Table 2.7.1.
Implementation Level Performance Application Implementation Application
Flexibility Complexity Isolation
Instruction Set Very Poor Very Good Medium Medium

Architecture Level (ISA)
Hardware Abstraction Very Good Medium Very Good Good

Level (HAL)
Operating System Level Very Good Poor Medium Poor
Library Level Medium Poor Poor Poor
Application Level Poor Poor Very Good Very Good
Table 2.7.1 Comparison between different implementation levels of virtualization
2.8 Virtualization Structures

In previous sections, we have already seen the basics of virtualization as it is nothing
but creation of a virtual version of hardware platform, operating system, storage or
network resources rather than actual. It allows to run multiple operating systems on a
single physical machine called host machine. Each instance of operating system called
Virtual Machine (VM) and operating system runs inside virtual machine is called guest
operating system. Depending on the position of the virtualization layer, there are two
classes of VM architectures, namely the hypervisor architectures like bare-metal or host-
based. The hypervisor is the software used for doing virtualization also known as the
VMM (Virtual Machine Monitor). The hypervisor software provides two different
structures of Virtualization namely Hosted structure (also called Type 2 Virtualization)
and Bare-Metal structure (also called Type 1 Virtualization) explained in following
section.
2.8.1 Hosted Structure (Type II)

In hosted structure, the guest OS and applications run on the top of base or host OS
with the help of VMM (called Hypervisor). The VMM stays between the base OS and
guest OS. This approach provides better compatibility of hardware because the base OS is
®
responsible for providing hardware drivers to guest OS instead of the VMM. In this type,
hypervisor has to rely on host OS for pass through permissions to access hardware. In
many cases, hosted hypervisor needs emulator, which lies between guest OS and VMM to
translate the instructions in native format. The hosted structure is shown in Fig. 2.8.1.
Fig. 2.8.1 Hosted Structure (Type II Hypervisor)

To implement Hosted structure, a base OS needs to be installed first over which VMM
can be installed. The hosted structure is simple solution to run multiple desktop OS
independently. Fig. 2.8.2 (a) and (b) shows Windows running on Linux base OS and
Linux running on Windows base OS using hosted Hypervisor.
Fig. 2.8.2 Hosted Hypervisors
®
The popular hosted hypervisors are QEMU, VMware Workstation, Microsoft Virtual
PC, Oracle VirtualBox etc.
The advantages of hosted structure are
 It is easy to install and manage without disturbing host systems hardware.
 It supports legacy operating systems and applications.
 It provides ease of use with greater hardware compatibility.
 It does not require to install any drivers for IO devices as they are installed through
built-in driver stack.
 It can be used for testing beta software.
 The hosted hypervisors are usually free software and can be run on user
workstations.
The disadvantages of hosted structure are
 It does not allow guest OS to directly access the hardware instead it has to go
through base OS, which increases resource overhead.
 It has very slow and degraded virtual machines performance due to relying on
intermediate host OS for getting hardware access.
 It doesn’t scale up beyond the limit.
2.8.2 Bare-Metal Structure (Type I)

 In Bare-Metal Structure, the VMM can be directly installed on the top of Hardware,
therefore no intermediate host OS is needed. The VMM can directly communicate
with the hardware and does not rely on the host system for pass through
permission which results in better performance, scalability and stability.
The Bare-Metal structure is shown in Fig. 2.8.3. (See Fig. 2.8.3 on next page).
 Bare-metal virtualization is mostly used in enterprise data centers for getting the
advanced features like resource pooling, high availability, disaster recovery and
security.
 The screenshot of Xen Server is shown in Fig. 2.8.4 (a) and its management console
called Xen center is shown in Fig. 2.8.4 (b). (See Fig. 2.8.4 on next page).
®
Fig. 2.8.3 Bare-Metal Structure (Type-I Hypervisor)
Fig. 2.8.4 Bare-Metal Xen Server Hypervisor
The popular Bare-Metal Hypervisors are Citrix Xen Server, VMware ESXI and
Microsoft Hyper V.
The advantages of Bare-Metal structure are
 It is faster in performance and more efficient to use.
 It provides enterprise features like high scalability, disaster recovery and high
availability.
 It has high processing power due to the resource pooling.
®
 It has lower overhead or maintenance cost.

 It provides ease of backup and recovery.
 It provides built-in fault-tolerance mechanisms.
 It has improved mobility and security.
The disadvantages of bare-metal structure are
 It has limited hardware support and poor stack of device drivers.
 It has high implementation cost
 It requires specialized servers to install and run hypervisor and do not run on user
workstations.
 In some cases, it becomes complex for management.
2.9 Virtualization Tools and Mechanisms

The hypervisor provides hypercalls for the guest OSes and applications to execute
privileged instructions. Depending on the functionality, there are two architectures of
hypervisors namely micro-kernel hypervisor architecture used by Microsoft Hyper-V or
monolithic hypervisor architecture used by VMware ESX for server virtualization. The
micro-kernel architecture of a hypervisor provides only the basic and unchanging
functions like physical memory management or processor scheduling. The dynamic
components and device drivers stay outside the hypervisor. While in monolithic
hypervisor architecture, the most dynamic and changeable functions are supported
including those of the device drivers. It can implement all the aforementioned functions
like CPU scheduling, memory management or IO management etc. That’s why; the size of
the hypervisor code of a micro-kernel hypervisor is always smaller than that of a
monolithic hypervisor. In previous section we have learned the OS virtualization
techniques like Hosted and Bare-Metal virtualization. The upcoming sections explain the
different virtualization tools and mechanisms.
2.9.1 Virtualization Tools

There are many virtualization tools available in the market, but two most popular
open source tools like Xen and KVM are explained as follows.
A) Xen
Xen is an open source bare-metal (Type I) hypervisor developed by Cambridge
University. It runs on the top of hardware without needing a host operating system. The
absence of host OS eliminate the need for pass through permission by the hypervisor. Xen
is a microkernel hypervisor, which separates the policy from the mechanism. It provides a
®
virtual environment located between the hardware and the OS. As Xen hypervisor runs
directly on the hardware devices, it runs many guest operating systems on the top of it.
The various operating system platforms supported as a guest OS by Xen hypervisor are
Windows, Linux, BSD and Solaris.
The architecture of the Xen hypervisor is shown in Fig. 2.9.1.
Fig. 2.9.1 Xen architecture
There are three core-components of the Xen system, namely kernel, hypervisor and
applications. It is important to note that the organization of these three components is
specific. The Xen hypervisor implements all the mechanisms, leaving the policy to be
handled by Domain 0.
The guest OS, which has control ability, is called Domain 0, and the others are called
Domain U. Domain 0 is the privileged guest OS of the Xen system and is responsible for
controlling the functionality of entire system. Other guests are known as Domain U.
Domain 0, which typically acts like a VMM, is the first one to get loaded when Xen starts
without any file system drivers being available. The Domain 0 handles the following
operations :
 Allocates or map hardware resources to Domain U domains or for guest domains.
 Manages all other VMs.
 Creates, copies, saves, reads, modifies, shares, migrates, and roll backs VMs.
 Accesses the underlying hardware.
 Manages IO and other devices.
Xen gives a virtual domain situated between the equipment and the OS. The Xen
hypervisor does not include any device drivers natively for guest OS. It provides a
®
mechanism by which guests OS can have direct access to the physical devices. That’s
why, size of the Xen hypervisor is kept rather small. Domain 0 is very crucial to the Xen
hypervisor and it needs to be protected. This is because, if the security of Domain 0 OS is
hampered by an intruder/hacker, she/he would gain control of the entire system. As
Domain 0, behaving as a VMM, any compromise in security of it may allow intruders to
create, copy, save, read, modify, share, migrate, and roll back VMs as easily as
manipulating a file.
B) KVM (Kernel-Based VM)

The Kernel-Based VM or KVM is an open source hosted (Type II) hypervisor which is
originally developed by Open Virtualization Alliance. It is a system-virtualization
solution that offers full virtualization support for x86 hardware-based operating systems
with virtualization extension for Intel VT or AMD-V. KVM is a hardware-assisted para-
virtualization tool that can host several virtual machines that execute Windows OS
images or Linux OS images with no alterations in them. Using KVM, one can run
multiple virtual machines running unmodified Linux or Windows images. Each virtual
machine has private virtualized hardware like a network card, disk, graphics adapter, etc.
It supports unmodified guest OSes such as Windows, Linux, Solaris, and other UNIX
variants. KVM is a Linux para-virtualization system where explicit modification in the
guest operating systems is required.
In KVM, Memory management and scheduling activities are carried out by the
existing Linux kernel. It can support hardware-assisted virtualization and
para-virtualization by using the Intel VT-x or AMD-v and VirtIO framework,
respectively. The VirtIO framework includes a paravirtual Ethernet card, a disk I/O
controller, a balloon device for adjusting guest memory usage, and a VGA graphics
interface using VMware drivers.
A Kernel-based Virtual Machine contains two main components, which are as follows :
 Loadable kernel module (kvm.ko), that provides the core virtualization
infrastructure and a processor specific modules (like kvm-intel. ko for intel
processors and kvm-amd.ko for AMD processors) for processor-specific drivers.
 User space program, which controls mechanisms to manage virtual machines and
offers emulation for virtual devices. For example, qemusystem-ARCH.
Fig. 2.9.2 illustrates the architecture of the KVM virtualization.
®
Fig. 2.9.2 KVM Virtualization Architecture
In KVM, Quick emulator is required for emulating the native and privileged
instructions issued by Guest OS. In KVM architecture, QEMU process runs as a user
space process on top of the Linux kernel with KVM module, and a guest kernel runs on
the of emulated hardware in QEMU. QEMU can co-work with KVM for hardware-based
virtualization. Using hardware-based virtualization, QEMU does not have to emulate all
CPU instructions, therefore it works really fast.
Some of the important features provide by KVM are
 Supports 32 and 64 bit guests OS (on 64 bit hosts)
 Supports hardware virtualization features
 Provides Para virtualized drivers for guest OS
 Provide synchronous Snapshots
 Gives Delta images of virtual machines along with PCI passthrough
 Kernel same page merging
 Support CPU and PCI hot plug feature
 It has built-in Qemu Monitor Protocol (QMP) and KVM Paravirtual Clock
2.9.2 Virtualization Mechanisms

Every hypervisor uses some mechanisms to control and manage virtualization
strategies that allow different operating systems such as Linux and Windows to be run on
the same physical machine, simultaneously. Depending on the position of the
®
virtualization layer, there are several classes of VM mechanisms, namely the binary
translation, para-virtualization, full virtualization, hardware assist virtualization and
host-based virtualization. The mechanisms of virtualization defined by VMware and
other virtualization providers are explained as follows.
2.9.2.1 Binary Translation with Full Virtualization
Based on the implementation technologies, hardware virtualization can be

characterized into two types namely full virtualization with binary translation and host-
based virtualization. The binary translation mechanisms with full and host-based
virtualization are explained as follows.
a) Binary translation
In Binary translation of guest OS, The VMM runs at Ring 0 and the guest OS at Ring 1.
The VMM checks the instruction stream and identifies the privileged, control and
behavior-sensitive instructions. At the point when these instructions are identified, they
are trapped into the VMM, which emulates the behavior of these instructions. The
method used in this emulation is called binary translation. The binary translation
mechanism is shown in Fig. 2.9.3.
Fig. 2.9.3 Binary Translation mechanism
b) Full Virtualization
In full virtualization, host OS doesn’t require any modification to its OS code. Instead
it relies on binary translation to virtualize the execution of some sensitive,
non-virtualizable instructions or execute trap. Most of the guest operating systems and
their applications composed of critical and noncritical instructions. These instructions are
executed with the help of binary translation mechanism. With full virtualization,
®
noncritical instructions run on the hardware directly while critical instructions are
discovered and replaced with traps into the VMM to be emulated by software. In a host-
based virtualization, both host OS and guest OS takes part in virtualization where
virtualization software layer lies between them.
Therefore, full virtualization works with binary translation to perform direct execution
of instructions where guest OS is completely decoupled from the underlying hardware
and consequently, it is unaware that it is being virtualized. The full virtualization gives
degraded performance, because it involves binary translation of instructions first rather
than executing which is rather time-consuming. Specifically, the full virtualization of I/O
intensive applications is a really a big challenge as Binary translation employs a code
cache to store translated instructions to improve performance, however it expands the
cost of memory usage.
c) Host-based virtualization
In host-based virtualization, the virtualization layer runs on top of the host OS and
guest OS runs over the virtualization layer. Therefore, host OS is responsible for
managing the hardware and control the instructions executed by guest OS. The host-
based virtualization doesn’t require to modify the code in host OS but virtualization
software has to rely on the host OS to provide device drivers and other low-level services.
This architecture simplifies the VM design with ease of deployment but gives degraded
performance compared to other hypervisor architectures because of host OS
interventions. The host OS performs four layers of mapping during any IO request by
guest OS or VMM which downgrades performance significantly.
2.9.2.2 Para-Virtualization
The para-virtualization is one of the efficient virtualization techniques that require

explicit modification to the guest operating systems. The APIs are required for OS
modifications in user applications which are provided by para-virtualized VM. In some of
the virtualized system, performance degradation becomes the critical issue. Therefore,
para-virtualization attempts to reduce the virtualization overhead, and thus improve
performance by modifying only the guest OS kernel. The para-virtualization architecture
is shown in Fig. 2.9.4.
®
Fig. 2.9.4 Para-virtualization architecture
The x86 processor uses four instruction execution rings namely Ring 0, 1, 2, and 3. The
ring 0 has higher privilege of instruction being executed while Ring 3 has lower privilege.
The OS is responsible for managing the hardware and the privileged instructions to
execute at Ring 0, while user-level applications run at Ring 3. The KVM hypervisor is the
best example of para-virtualization. The functioning of para-virtualization is shown in
Fig. 2.9.5.
Fig. 2.9.5 Para-virtualization (Source : VMware)
In para-virtualization, virtualization layer is inserted between the hardware and the

OS. As x86 processor requires virtualization layer should be installed at Ring 0, the other
instructions at Ring 0 may cause some problems. In this architecture, the nonvirtualizable
instructions are replaced with hypercalls that communicate directly with the hypervisor
or VMM. The user applications directly get executed upon user request on host system
hardware.
®
Some disadvantages of para-virtualization are although para-virtualization reduces

CPU overhead, but still has many issues with compatibility and portability of virtual
system, it incurs high cost for implementation and maintenance and performance of
virtualization varies due to workload variation. The popular examples of para-
virtualization are Xen, KVM, and VMware ESXi.
a) Para-Virtualization with Compiler Support
The para-virtualization supports privileged instructions to be executed at run time. As
full virtualization architecture executes the sensitive privileged instructions by
intercepting and emulating them at runtime, para-virtualization can handle such
instructions at compile time. In Para-Virtualization with Compiler Support thee guest OS
kernel is modified to replace the privileged and sensitive instructions with hypercalls to
the hypervisor or VMM at compile time itself. The Xen hypervisor assumes such
para-virtualization architecture.
Here, guest OS running in a guest domain may run at Ring 1 instead of at Ring 0 that’s
why guest OS may not be able to execute some privileged and sensitive instructions.
Therefore, such privileged instructions are implemented by hypercalls to the hypervisor.
So, after replacing the instructions with hypercalls, the modified guest OS emulates the
behavior of the original guest OS.
2.10 Virtualization of CPU

The CPU Virtualization is related to range protection levels called rings in which code
can execute. The Intel x86 architecture of CPU offers four levels of privileges known as
Ring 0, 1, 2 and 3.
Fig. 2.10.1 CPU Privilege Rings

®
Among that Ring 0, Ring 1 and Ring 2 are associated with operating system while Ring
3 is reserved for applications to manage access to the computer hardware. As Ring 0 is
used by kernel because of that Ring 0 has the highest-level privilege while Ring 3 has
lowest privilege as it belongs to user level application shown in Fig. 2.10.1.
The user level applications typically run in Ring 3, the operating system needs to have
direct access to the memory and hardware and must execute its privileged instructions in
Ring 0. Therefore, Virtualizingx86 architecture requires placing a virtualization layer
under the operating system to create and manage the virtual machines that delivers
shared resources. Some of the sensitive instructions can’t be virtualized as they have
different semantics. If virtualization is not provided then there is a difficulty in trapping
and translating those sensitive and privileged instructions at runtime which become the
challenge. The x86 privilege level architecture without virtualization is shown in
Fig. 2.10.2.
Fig. 2.10.2 X86 privilege level architecture without virtualization
In most of the virtualization system, majority of the VM instructions are executed on

the host processor in native mode. Hence, unprivileged instructions of VMs can run
directly on the host machine for higher efficiency. Therefore, there is a need to handle
critical instructions carefully for correctness and stability. These critical instructions are
categorized into three types namely privileged instructions, control sensitive instructions,
and behavior-sensitive instructions. The privileged instructions are executed in a
privileged mode and get trapped if executed outside this mode. The control-sensitive
instructions allow to change the configuration of resources used during execution while
Behavior-sensitive instructions uses different behaviors of CPU depending on the
configuration of resources, including the load and store operations over the virtual
memory. Generally, the CPU architecture is virtualizable if and only if it provides ability
to run the VM’s privileged and unprivileged instructions in the CPU’s user mode during
®
which VMM runs in supervisor mode. When the privileged instructions along with
control and behavior-sensitive instructions of a VM are executed, then they get trapped in
the VMM. In such scenarios, the VMM becomes the unified mediator for hardware access
from different VMs and guarantee the correctness and stability of the whole system.
However, not all CPU architectures are virtualizable. There are three techniques can be
used for handling sensitive and privileged instructions to virtualize the CPU on the x86
architecture :
1) Binary translation with full virtualization
2) OS assisted virtualization or para-virtualization
3) Hardware assisted virtualization
The above techniques are explained in detail as follows.
1) Binary translation with full virtualization

In binary translation, the virtual machine
issues privileged instructions contained within
their compile code. The VMM takes control on
these instructions and changes the code under
execution to avoid the impact on state of the
system. The full virtualization technique does
not need to modify host operating system. It
relies on binary translation to trap and
virtualize the execution of certain instructions.
The noncritical instructions directly run on the Fig. 2.10.3 Binary Translation with Full
hardware while critical instructions have to be Virtualization
discovered first then they are replaced with
trap in to VMM to be emulated by software. This combination of binary translation and
direct execution provides full virtualization as the guest OS is completely decoupled from
the underlying hardware by the virtualization layer. The guest OS is not aware that it is
being virtualized and requires no modification. The performance of full virtualization
may not be ideal because it involves binary translation at run-time which is time
consuming and can incur a large performance overhead. Full virtualization offers the best
isolation and security for virtual machines, and simplifies migration and portability as the
same guest OS instance can run virtualized or on native hardware. The full virtualization
is only supported by VMware and Microsoft’s hypervisors. The binary translation with
full virtualization is shown in Fig. 2.10.3.
®
2) OS assisted virtualization or para-virtualization

The para-virtualization technique refers to making communication between guest OS
and the hypervisor to improve the performance and efficiency. The para-virtualization
involves modification to the OS kernel that replaces the non-virtualized instructions with
hypercalls and can communicate directly with the virtualization or layer hypervisor. A
hypercall is based on the same concept as a system call. The call made by hypervisor to
the hardware is called hypercall. In para-virtualization the hypervisor is responsible for
providing hypercall interfaces for other critical kernel operations such as memory
management, interrupt handling and time keeping.
Fig. 2.10.4 shows para-virtualization.
Fig. 2.10.4 Para-virtualization
3) Hardware Assisted Virtualization (HVM)

This technique attempts to simplify virtualization because full or para-virtualization is
complicated in nature. The Processor makers like Intel and AMD provides their own
proprietary CPU Virtualization Technologies called Intel VT-x and AMD-V. Intel and
AMD CPUs add an additional mode called privilege mode level to x86 processors. All the
privileged and sensitive instructions are trapped in the hypervisor automatically. This
technique removes the difficulty of implementing binary translation of full virtualization.
It also lets the operating system run in VMs without modification. Both of them target
privileged instructions with a new CPU execution mode feature that allows the VMM to
run in a new root mode below ring 0, also referred to as Ring 0P (for privileged root
mode) while the Guest OS runs in Ring 0D (for de-privileged non-root mode). The
®
Privileged and sensitive calls are set automatically to trap the hypervisor running on
hardware that removes the need for either binary translation or para-virtualization. The
Fig. 2.10.5 shows Hardware Assisted Virtualization.
Fig. 2.10.5 Hardware Assisted Virtualization
2.11 Virtualization of Memory

The memory virtualization involves physical memory to be shared and dynamically
allocated to virtual machines. In a traditional execution environment, the operating
system is responsible for maintaining the mappings of virtual memory to machine
memory using page tables. The page table is a single-stage mapping from virtual memory
to machine memory. All recent x86 CPUs comprises built-in Memory Management Unit
(MMU) and a Translation Lookaside Buffer (TLB) to improve the virtual memory
performance. However, in a virtual execution environment, the mapping is required from
virtual memory to physical memory and physical memory to machine memory; hence it
requires two-stage mapping process.
The modern OS provides virtual memory support that is similar to memory
virtualization. The Virtualized memory is seen by the applications as a contiguous
address space which is not tied to the underlying physical memory in the system. The
operating system is responsible for mappings the virtual page numbers to physical page
numbers stored in page tables. To optimize the Virtual memory performance all modern
x86 CPUs include a Memory Management Unit (MMU) and a Translation Lookaside
Buffer (TLB). Therefore, to run multiple virtual machines with Guest OS on a single
system, the MMU has to be virtualized shown in Fig. 2.11.1.
®
Fig. 2.11.1 Memory Virtualization
The Guest OS is responsible for controlling the mapping of virtual addresses to the
guest memory physical addresses, but the Guest OS cannot have direct access to the
actual machine memory. The VMM is responsible for mapping the Guest physical
memory to the actual machine memory, and it uses shadow page tables to accelerate the
mappings. The VMM uses TLB (Translation Lookaside Buffer) hardware to map the
virtual memory directly to the machine memory to avoid the two levels of translation on
every access. When the guest OS changes the virtual memory to physical memory
mapping, the VMM updates the shadow page tables to enable a direct lookup. The
hardware-assisted memory virtualization by AMD processor provides hardware
assistance to the two-stage address translation in a virtual execution environment by
using a technology called nested paging.
2.12 Virtualization of I/O Device

The virtualization of devices and I/O’s is bit difficult than CPU virtualization. It
involves managing the routing of I/O requests between virtual devices and the shared
physical hardware. The software based I/O virtualization and management techniques
can be used for device and I/O virtualization to enables a rich set of features and
simplified management. The network is the integral component of the system which
enables communication between different VMs. The I/O virtualization provides virtual
NICs and switches that create virtual networks between the virtual machines without the
network traffic and consuming bandwidth on the physical network. The NIC teaming
allows multiple physical NICS to be appearing as one and provides failover transparency
for virtual machines. It allows virtual machines to be seamlessly relocated to different
systems using VMware VMotion by keeping their existing MAC addresses. The key for
effective I/O virtualization is to preserve the virtualization benefits with minimum CPU
utilization. Fig. 2.12.1 shows device and I/O virtualization.
®
Fig. 2.12.1 Device and I/O virtualization
The virtual devices shown in above Fig. 2.12.1 can be effectively emulate on
well-known hardware and can translate the virtual machine requests to the system
hardware. The standardize device drivers help for virtual machine standardization. The
portability in I/O Virtualization allows all the virtual machines across the platforms to be
configured and run on the same virtual hardware regardless of their actual physical
hardware in the system. There are three ways of implementing I/O virtualization. The
full device emulation approach emulates well-known real-world devices where all the
functions of device such as enumeration, identification, interrupt and DMA are replicated
in software. The para-virtualization method of IO virtualization uses split driver model
that consist of frontend and backend drivers. The front-end driver runs on Domain U
which manages I/O request of guest OS. The backend driver runs Domain 0 which
manages real I/O devices with multiplexing of I/O data of different VMs. They interact
with each other via block of shared memory. The direct I/O virtualization let the VM to
access devices directly.it mainly focus on networking of mainframes. There are four
methods to implement I/O virtualization namely full device emulation, para-
virtualization, and direct I/O virtualization and through self-virtualized I/O.
In full device emulation, the IO devices are virtualized using emulation software. This
method can emulate all well-known and real-world devices. The emulation software is
responsible for performing all the functions of a devices or bus infrastructure, such as
device enumeration, identification, interrupts, and DMA which are replicated. The
software runs inside the VMM and acts as a virtual device. In this method, the I/O access
®
requests of the guest OS are trapped in the VMM which interacts with the I/O devices.
The multiple VMs share a single hardware device for running them concurrently.
However, software emulation consumes more time in IO access that’s why it runs much
slower than the hardware it emulates.
In para-virtualization method of I/O virtualization, the split driver model is used
which consist of frontend driver and backend driver. It is used in Xen hypervisor with
different drivers like Domain 0 and Domain U. The frontend driver runs in Domain U
while backend driver runs in Domain 0. Both the drivers interact with each other via a
block of shared memory. The frontend driver is responsible for managing the I/O
requests of the guest OSes while backend driver is responsible for managing the real I/O
devices and multiplexing the I/O data of different VMs.
The para-virtualization method of I/O virtualization achieves better device
performance than full device emulation but with a higher CPU overhead.
In direct I/O virtualization, the virtual machines can access IO devices directly. It does
not have to rely on any emulator of VMM. It has capability to give better IO performance
without high CPU costs than para-virtualization method. It was designed for focusing on
networking for mainframes.
In self-virtualized I/O method, the rich resources of a multicore processor and
harnessed together. The self-virtualized I/O encapsulates all the tasks related with
virtualizing an I/O device. The virtual devices with associated access API to VMs and a
management API to the VMM are provided by self-virtualized I/O that defines one
Virtual Interface (VIF) for every kind of virtualized I/O device.
The virtualized I/O interfaces are virtual network interfaces, virtual block devices
(disk), virtual camera devices, and others. The guest OS interacts with the virtual
interfaces via device drivers. Each VIF carries a unique ID for identifying it in self-
virtualized I/O and consists of two message queues. One message queue for outgoing
messages to the devices and another is for incoming messages from the devices.
As there are a many of challenges associated with commodity hardware devices, the
multiple IO virtualization techniques need to be incorporated for eliminating those
associated challenges like system crash during reassignment of IO devices, incorrect
functioning of IO devices and high overhead of device emulation.
2.13 Virtualization Support and Disaster Recovery

In cloud computing, virtual machines are the containers of cloud services which can
run any services on the top of it. The virtualization becomes the key aspect in cloud
computing. Because of abstraction in cloud services due to virtualization, cloud user does
®
not have to bother about physical servers through which the services are provisioned and
application developers do not worry about network issues or infrastructure problems
such as scalability, latency and fault tolerance.
Virtualization software is used in most cloud computing systems to virtualize the
hardware.
It simulates hardware execution, and even runs unmodified operating systems. Some
of the prominent advantages of virtualization for cloud computing are
 Supports legacy software applications and old operating systems.
 Provides a readily available development and deployment environment for
developers to build a cloud application with wide variety of tools and platforms
 Provision virtual machines on demand along with unmatched scalability.
 Provides flexibility for users and developers to use the platform.
 Provides high throughput, high availability and effective load balancing.
 Provides Disaster recovery along with centralized resource and data management.
And so on. The functional representation of virtualization in cloud computing is
shown in Fig. 2.13.1. (See Fig. 2.13.1 on next page).
Some of the applications of virtualization are given as follows.
a) Virtualization for Public cloud platform
Today, every public cloud service provider uses Virtualization to save their physical
resources, energy and manpower along with making cloud services easier for access,
effective and reliable. The cloud service providers like AWS, Google or Microsoft
provides freedom for their customers to develop and deploy applications on their cloud
platform seamlessly. Because of that, today everyone is interested in using the public
cloud services which are deployed under the virtualization solution.
b) Virtualization for Green Data Centers
As we know that, because of huge power consumption by physical servers and other
equipment’s in data center, IT power consumption reached to the remarkable figure.
Because of that, many countries are facing energy crisis to a great extent.
Therefore, virtualization can be used to make low power consumption and effectively
cost reduction in IT data centers. It makes a great impact on cost reduction and power
consumption due to consolidation of many physical servers in to fewer.
Therefore, concept of Green Data Centers comes into picture where storage and other
virtualization mechanisms can be used to minimize the use of power, energy, cost as well
as physical servers.
®
Fig. 2.13.1 Virtualization in Cloud computing
c) Virtualization for IaaS

VM technology has become increasingly ubiquitous. This allowed users to create
customized environments for cloud computing on the top of physical infrastructure. The
use of VMs in clouds include following distinct benefits like consolidating workloads of
underutilized servers into few servers, allow VMs to run legacy code without interfering
®
with APIs, improve security of applications by building sandbox environment over VMs
and provide better QoS to applications and performance isolation over the virtualized
cloud platform.
d) Virtualization for Disaster Recovery

In IT organizations, disaster recovery is the must have technique which provides
continuous and uninterrupted delivery of IT resources and services even in case of
hardware or other failures due to natural disasters or any other reasons. Disaster recovery
involves a collection of policies, tools and procedures to enable the recovery or
continuation of critical infrastructure resources and systems following a natural or
human-induced disaster.
Virtualization technology calls for an integrated disaster recovery program that allows
to recover one VM by another VM. As we know, conventional disaster recovery from one
physical machine to another is rather slow, complex, and expensive. The total recovery
time is required for configuring the hardware, installing and configuring the operating
system, installing the backup agents, and restarting the physical machine is very large.
Therefore, to reduce the recovery time the VM platforms are used that reduce the
installation and configuration times for the operating system and eliminated the backup
agents. Virtualization helps by encapsulation of VM in rapid disaster recovery, provided
in all clone VMs, only one must be active. VM cloning provides an efficient solution. With
every VM running on a local server, the concept is to create a clone VM on a remote
server. For all clone VMs, only one must be active. By default, the remote VM should be
in suspended mode. In the event of a failure of the original VM, a cloud platform should
be able to enable this clone VM, taking a snapshot of the VM to allow live migration in a
minimum of time. The VM that has been migrated will operate on a shared Internet
connection. The suspended VM only receives updated data and change its status. In
cloud, virtualization provides updated Recovery Point Objective (RPO) and Recovery
Time Objective (RTO) in case of a disaster or system restore.
Summary
 The Service Oriented Architecture (SOA) expresses a perspective of software

architecture that defines the use of loosely coupled software services to support
the requirements of the business processes.
 The architecture style of SOA is defined by the World Wide Web Consortium
(W3C) based on the three parameters namely logical perspective, message
perspectives and description orientation.
®
 Representational State Transfer (REST) is a software architectural style for

distributed system that defines a set of constraints to be used for creating Web
based services.
 Web services are loosely coupled (platform independent), contracted
components (behavior, input and output parameters, binding specifications are
public) that communicate in XML-based (open standard) interfaces.
 SOAP, WSDL and UDDI are the three essential components of web services.
 A web service protocol stack is a list of protocols that are utilized to define, find,
execute, and make web services collaborate with one another.
 The “Publish-Subscribe Model” describes a specific model for connecting source
and destination for a message transport.
 The term “Virtualization” is nothing but creation of a virtual version of
hardware platform, operating system, storage or network resources rather than
actual. It allows running multiple operating systems on a single physical
machine called host machine.
 Based on the functionality of virtualized applications, there are five basic types
of virtualization namely Desktop Virtualization, Application Virtualization,
Server Virtualization, Storage Virtualization and Network Virtualization.
 There are five implementation levels of virtualization that are Instruction Set
Architecture (ISA) Level, Hardware Level, Operating System Level, Library
Support Level and Application Level.
 The hypervisor software provides two different structures of virtualization
namely Hosted structure (also called Type 2 Virtualization) and Bare-Metal
structure (also called Type 1 Virtualization).
 In Hosted structure, the guest OS and applications runs on the top of base or
host OS with the help of VMM while in Bare-Metal Structure, the VMM can be
directly installed on the top of Hardware, therefore no intermediate host OS is
needed.
 There are two virtualization tools like Xen and KVM where Xen is the Type I
hypervisor and KVM is hosted hypervisor.
 There are two mechanism of virtualization like Binary Translation with full
virtualization and para-virtualization.
 The CPU virtualization is related to range protection levels called rings in which
code can execute while memory virtualization involves physical memory to be
shared and dynamically allocated to virtual machines.
®
 IO virtualization involves managing the routing of I/O requests between virtual

devices and the shared physical hardware.
 Virtualization technology calls for an integrated disaster recovery program that
allows recovering one VM by another VM.
Two Marks Questions with Answers
Q. 1 What is Service Oriented Architecture ? AU : Dec.18

Q.2 Justify Web and Web architectures are SOA based. AU : May-18
Ans. : SOA is an architectural style for building software applications that use
services available in a network such as the web. The applications built using SOA are
mostly web based that uses web architecture defined by the World Wide Web
Consortium (W3C). These web applications are often distributed over the networks
which aim to make services interoperable, extensible and effective. The web and web
services are the most common example provided by the SOA model which delivers
well-defined set of implementation choices for web architectures like XML based SOAP
and Web Service Definition Language (WSDL).
Q.3 “Although virtualization is widely accepted today; it does have its limits”. Comment
on the statement. AU : May-18
Ans. : Although virtualization is widely accepted today; it does have its limitations
that are listed below.
 High upfront Investments : Organisations need to acquire resources beforehand to
implement Virtualization. Also, there might occur a need to incur additional
resources with time.
 Performance Issues : Although virtualization is an efficient technique and efficiency
can be increased by applying some techniques, there may be chances when the
efficiency is not as good as that of the actual physical systems.
 Licensing Issues : All software may not be supported on virtual platforms. Although
vendors are becoming aware of the increasing popularity of virtualization and have
started providing licenses for software to run on these platforms, the problem has
not completely vanished. Therefore, it is advised to check the licenses with the
vendor before using the software.
®
 Difficulty in Root Cause Analysis : With the addition of an additional layer in

virtualization, complexity gets increased. This increased complexity makes root
cause analysis difficult in case of unidentified problems.
Q.4 List the requirements of VMM. AU : Dec.-17
Ans. : The requirements of VMM or hypervisor are
 VMM must support efficient task scheduling and resource allocation techniques.
 VMM should provide an environment for programs which is essentially identical to

the original physical machine.
 A VMM should be in complete control of the system resources.
 Any program run under a VMM should exhibit a function identical to that which it
runs on the original physical machine directly.
 VMM must be tightly related to the architectures of processors
Q.5 Give the role of a VM. AU : Dec.-16
OR Give the basic operations of a VM. AU : May -17

Ans. : Virtualization allows running multiple operating systems on a single physical
machine. Each instance of operating system running inside called Virtual machine
(VM). The main role of VM is to allocate the host machine resources to run Operating
system. The other roles of VM are
 Provide virtual hardware, including CPUs, memory, storage, hard drives, network
interfaces and other devices to run virtual operating system.
 Provide fault and security isolation at the hardware level.
 Preserve performance with advanced resource controls.
 Save the entire state of a virtual machine to files.
 Move and copy virtual machines data as easily as like moving and copying files.
 Provision to migrate any virtual machine to any physical server.
Q.6 What is the impact of SOA in cloud ? AU : Dec.-19

Ans. : The SOA and cloud computing share many common principles as both works
on principals of service. The key challenges of a cloud computing are security,
integration, adaptation, agility and QoS aspects like performance, latency and
availability. These challenges can be addressed with an SOA-based architecture using
concept of service intermediation, service arbitrage and service aggregation. Because of
SOA, cloud computing leverage has many advantages like,
 Simple construction and maintenance of services
®
 Service reusability
 Ease of data exchange
 Platform integration
 Loosely coupled architecture
Q.7 Give the significance of virtualization. AU : Dec.-19

Ans. : As we know that the large amounts of compute, storage, and networking
resources are needed to build a cluster, grid or cloud solution. These resources need to
be aggregated at one place to offer a single system image. Therefore, the concept of
virtualization comes into the picture where resources can be aggregated together to
fulfill the request for resource provisioning with rapid speed as a single system image.
The virtualization is a novel solution that can offer application inflexibility, software
manageability, optimum resource utilization and security concerns in existing physical
machines. In particular, every cloud solution has to rely on virtualization solution for
provisioning the resources dynamically. Therefore, virtualization technology is one of
the fundamental components of cloud computing. It provides secure, customizable, and
isolated execution environment for running applications on abstracted hardware. It is
mainly used for providing different computing environments. Although these
computing environments are virtual but appear like to be physical. The different
characteristics of virtualization are,
 Maximum resource utilization • Reduces Hardware Cost
 Minimize the maintenance cost • Supports Dynamic Load balancing
 Supports Server Consolidation • Supports Disaster recovery
 Can run Legacy applications and can test Beta Softwares
Q.8 Define Virtualization. AU : May -19

Ans. : The term Virtualization is nothing but creation of a virtual version of hardware
platform, operating system, storage or network resources rather than actual. It allows to
run multiple operating systems on a single physical machine called host machine. Each
instance of operating system called Virtual Machine (VM) and operating system runs
inside virtual machine is called guest operating system.
Q.9 Define the term web service. AU : Dec.-18
Ans. : Web services are loosely coupled (platform independent), contracted
components (behavior, input and output parameters, binding specifications are public)
that communicate in XML-based (open standard) interfaces. When a web service is
deployed, different applications and other web services can find and invoke the
®
deployed service. The term "web service" is frequently alluded to an independent, self-
describing, modular application intended to be utilized and accessible by other
software applications over the web.
Q.10 What are different characteristics of SOA ?
Ans. : The different characteristics of SOA are as follows :
 Provides interoperability between the services.
 Provides methods for service encapsulation, service discovery, service composition,

service reusability and service integration.
 Facilitates QoS (Quality of Services) through service contract based on Service Level
Agreement (SLA).
 Provides loosely couples services.
 Provides location transparency with better scalability and availability.
 Ease of maintenance with reduced cost of application development and deployment.
Q.11 Define REST.

Ans. : Representational State Transfer (REST) is a software architectural style for
distributed system that defines a set of constraints to be used for creating web based
services. It is mean to provide interoperability between the systems based on services
running on the Internet. The web services that follow the REST architectural style are
called RESTful Web services. The RESTful Web services allow the requesting systems to
access and manipulate textual representations of Web resources by using a uniform and
predefined set of stateless operations.
Q.12 Enlist different REST methods used in web services.
Q.13 What is the role of WSDL in web services ?
Ans. : The WSDL is an XML based document which describes the interfaces and set of
operations supported by a web service in a standardize format. It is used for
standardizing the representation of input and output parameters along with its
operations. It is an XML document used for describing web services. The WSDL
document contains information on data types to be used, messages to be exchanged,
operations performed by the web service and communication protocol to be followed.
Q.14 What is Publish-subscribe model ?
Q.15 Enlist the pros and cons of virtualization ?
Ans. : Refer section 2.5.2.
®
Q.16 What is server virtualization ?

Ans. : A server virtualization is the process of dividing a physical server into multiple
unique and isolated virtual servers by means of software. It partitions a single physical
server into the multiple virtual servers; each virtual server can run its own operating
system and applications independently. The virtual server is also termed as virtual
machine. The consolidation helps in running many virtual machines under a single
physical server. Each virtual machine shares the hardware resources from physical
server that leads to better utilization of the physical servers’ resources. The resources
utilized by virtual machine include CPU, memory, storage, and networking. The
hypervisor is the operating system or software that runs on the physical machine to
perform server virtualization. The hypervisor running on physical server is responsible
for providing the resources to the virtual machines. Each virtual machine runs
independently of the other virtual machines on the same box with different operating
systems that are isolated from each other. The popular server virtualization softwares
are VMware’s vSphere, Citrix Xen Server, Microsoft’s Hyper-V, and Red Hat’s
Enterprise Virtualization.
Q.17 Compare between different implementation levels of virtualization.
Ans. : The comparison between different implementation levels of virtualization is
given in following table.
Implementation Level Performance Application Implementation Application
Flexibility Complexity Isolation
Instruction Set Architecture Level Very Poor Very Good Medium Medium
(ISA)
Hardware Abstraction Level Very Good Medium Very Good Good
(HAL)
Operating System Level Very Good Poor Medium Poor
Library Level Medium Poor Poor Poor
Application Level Poor Poor Very Good Very Good
Q.18 Enlist advantages and disadvantages of Bare-Metal structure.

Ans. : The advantages of Bare-Metal structure are
 It is faster in performance and more efficient to use.
 It provides enterprise features like high scalability, disaster recovery and high
availability.
 It has high processing power due to the resource pooling.
®
 It has lower overhead or maintenance cost.
 It provides ease of backup and recovery.
 It provides built-in fault-tolerance mechanisms.
 It has improved mobility and security.
The disadvantages of Bare-Metal structure are

 It has limited hardware support and poor stack of device drivers.
 It has high implementation cost
 It requires specialized servers to install and run hypervisor and do not run on user
workstations.
 In some cases, it becomes complex for management.
Q.19 What is disaster recovery ?

Ans. : Disaster recovery is the must have technique which provides continuous and
uninterrupted delivery of IT resources and services even in case of hardware or other
failures due to natural disasters or any other reasons. Disaster recovery involves a
collection of policies, tools and procedures to enable the recovery or continuation of
critical infrastructure resources and systems following a natural or human-induced
disaster.
Q.20 What is Xen ?
Ans. : Xen is an open source Bare-Metal (Type I) hypervisor developed by Cambridge
University. It runs on the top of hardware without needing a host operating system.
The absence of host OS eliminate the need for pass through permission by the
hypervisor. Xen is a microkernel hypervisor, which separates the policy from the
mechanism. It provides a virtual environment located between the hardware and the
OS. As Xen hypervisor runs directly on the hardware devices, it runs many guest
operating systems on the top of it. The various operating system platforms supported
as a guest OS by Xen hypervisor are Windows, Linux, BSD and Solaris.
Q.1 “Virtualization is the wave of the future”. Justify. Explicate the process of CPU,
Memory and I/O device virtualization in data center. AU : May -18
OR Explain the virtualization of CPU, memory and I/O devices. AU : May-19

Ans. : Refer section 2.10, 2.11 and 2.12.
Q.2 Explain virtualization of I/O devices with examples. AU : Dec.-18
®
Q.3 What is virtualization ? Describe para and full virtualization architectures, compare
and contrast them. AU : Dec.-17
Ans. : Refer sections 2.9.2.2 and 2.10
Comparison between Para virtualization and Full virtualization
Para virtualization Full virtualization

Guest OS aware of host and about it is being Guest OS unaware of host and about it is
virtualized. being virtualized.
Modification is required for Guest OS No modification is required for Guest OS
Limiting support to fever OS Wide support for most of the OS
Better performance than full virtualization Lower performance than para virtualization
Lower virtualization overhead Higher virtualization overhead
Supports Type II or hosted hypervisors Support type I native or Baremetal

hypervisors
It runs over the host operating system Directly runs over the host machine
running on the host hardware hardware
KVM is the well-known example of para VMWare ESXi and Microsoft Virtual Server
virtualization. are the examples of full virtualization
Q.4 Illustrate the architecture of virtual machine and brief about the operations.
AU : Dec.-16
Ans. : Refer section 2.8, structures of virtualization (Hosted and Bare-Metal).
Q.5 Write short note on Service Oriented Architecture. AU : Dec.-16
Q.6 Discuss how virtualization is implemented in different layers. AU : May-17
Ans. : Refer section 2.7, Implementation Levels of Virtualization.
Q.7 Analyse how the virtualization technology supports the cloud computing. AU : May-19
Q.8 Write a detailed note on web services.
Q.9 Explain in detail web services protocol stack and publish-subscribe models with
respect to web services.
Ans. : Refer section 2.3.1 and 2.4.
®
Q.10 Write a detailed note on virtualization and its structures.

Ans. : Refer section 2.5 and 2.8.
Q.11 Explain different types of virtualization with examples.
Q.12 What are different mechanisms of virtualizations ?
Q.13 Explain in brief Xen architecture.
Ans. : Refer section 2.9.1 (a).
Q.14 Explain in brief KVM architecture.
Ans. : Refer section 2.9.1 (b).


®
3
Cloud Architecture, Services
and Storage
Syllabus
Layered cloud architecture design - NIST cloud computing reference architecture - Public, Private
and Hybrid clouds - laaS - PaaS - SaaS - Architectural design challenges - Cloud storage -
Storage-as-a-Service - Advantages of cloud storage - Cloud storage providers - S3.
Contents
3.1 Cloud Architecture Design
3.2 NIST Cloud Computing Reference Architecture
3.3 Cloud Deployment Models
3.4 Cloud Service Models
3.5 Architectural Design Challenges
3.6 Cloud Storage
3.7 Storage as a Service
3.8 Advantages of Cloud Storage
3.9 Cloud Storage Providers
3.10 Simple Storage Service (S3)
(3 - 1)
Cloud Computing 3-2 Cloud Architecture, Services and Storage
3.1 Cloud Architecture Design

The cloud architecture design is the important aspect while designing a cloud. The
simplicity in cloud services attract cloud users to use it which makes positive business
impact. Therefore, to design such a simple and user - friendly services, the cloud
architecture design plays an important role to develop that. Every cloud platform is
intended to provide four essential design goals like scalability, reliability, efficiency and
virtualization. To achieve this goal, certain requirements has to be considered. The basic
requirements for cloud architecture design are given as follows :
 The cloud architecture design must provide automated delivery of cloud services
along with automated management.
 It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
 It must support very large - scale HPC infrastructure with both physical and virtual
machines.
 The architecture of cloud must be loosely coupled.
 It should provide easy access to cloud services through a self - service web portal.
 Cloud management software must be efficient to receive the user request, finds the
correct resources and then calls the provisioning services which invoke the
resources in the cloud.
 It must provide enhanced security for shared access to the resources from data
centers.
 It must use cluster architecture for getting the system scalability.
 The cloud architecture design must be reliable and flexible.
 It must provide efficient performance and faster speed of access.
Today's clouds are built to support lots of tenants (cloud devices) over the resource
pools and large data volumes. So, the hardware and software plays an important role to
achieve that. The rapid development in multicore CPUs, memory chips, and disk arrays
in the hardware field has made it possible to create data centers with large volumes of
storage space instantly. While development in software standards like web 2.0 and SOA
have immensely helped to developed a cloud services. The Service - Oriented
Architecture (SOA) is also a crucial component which is used in the delivery of SaaS. The
web service software detects the status of the joining and leaving of each node server and
performs appropriate tasks accordingly. The virtualization of infrastructure allows for
quick cloud delivery and recovery from disasters. In recent cloud platforms, resources are
®
built into the data centers which are typically owned and operated by a third - party
provider. The next section explains the layered architecture design for cloud platform.
3.1.1 Layered Cloud Architecture Design

The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform and application. These three levels of architecture are
implemented with virtualization and standardization of cloud - provided hardware and
software resources. This architectural design facilitates public, private and hybrid cloud
services that are conveyed to users through networking support over the internet and the
intranets. The layered cloud architecture design is shown in Fig. 3.1.1.
Fig. 3.1.1 : Layered cloud architecture design
In layered architecture, the foundation layer is infrastructure which is responsible for

providing different Infrastructure as a Service (IaaS) components and related services. It
is the first layer to be deployed before platform and application to get IaaS services and to
run other two layers. The infrastructure layer consists of virtualized services for
computing, storage and networking. It is responsible for provisioning infrastructure
components like compute (CPU and memory), storage, network and IO resources to run
®
virtual machines or virtual servers along with virtual storages. The abstraction of these
hardware resources is intended to provide the flexibility to the users. Internally,
virtualization performs automated resource provisioning and optimizes the process of
managing resources. The infrastructure layer act as a foundation for building the second
layer called platform layer for supporting PaaS services.
The platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. This layer provides an
environment for users to create their applications, test operation flows, track the
performance and monitor execution results. The platform must be ensuring to provide
scalability, reliability and security. In this layer, virtualized cloud platform, acts as an
"application middleware" between the cloud infrastructure and application layer of cloud.
The platform layer is the foundation for application layer.
A collection of all software modules required for SaaS applications forms the
application layer. This layer is mainly responsible for making on demand application
delivery. In this layer, software applications include day-to-day office management
softwares used for information collection, document processing, calendar and
authentication. Enterprises also use the application layer extensively in business
marketing, sales, Customer Relationship Management (CRM), financial transactions and
Supply Chain Management (SCM). It is important to remember that not all cloud services
are limited to a single layer. Many applications can require mixed - layers resources. After
all, with a relation of dependency, the three layers are constructed from the bottom up
approach. From the perspective of the user, the services at various levels need specific
amounts of vendor support and resource management for functionality. In general, SaaS
needs the provider to do much more work, PaaS is in the middle and IaaS requests the
least. The best example of application layer is the Salesforce.com's CRM service where not
only the hardware at the bottom layer and the software at the top layer is supplied by the
vendor, but also the platform and software tools for user application development and
monitoring.
3.2 NIST Cloud Computing Reference Architecture

In this section, we will examine and discuss the reference architecture model given by
the National Institute of Standards and Technology (NIST). The model offers approaches
for secure cloud adoption while contributing to cloud computing guidelines and
standards.
®
The NIST team works closely with leading IT vendors, developers of standards,
industries and other governmental agencies and industries at a global level to support
effective cloud computing security standards and their further development. It is
important to note that this NIST cloud reference architecture does not belong to any
specific vendor products, services or some reference implementation, nor does it prevent
further innovation in cloud technology.
The NIST reference architecture is shown in Fig. 3.2.1.
Fig. 3.2.1 : Conceptual cloud reference model showing different actors and entities
From Fig. 3.2.1, note that the cloud reference architecture includes five major actors :
 Cloud consumer
 Cloud provider
 Cloud auditor
 Cloud broker
 Cloud carrier
Each actor is an organization or entity plays an important role in a transaction or a
process, or performs some important task in cloud computing. The interactions between
these actors are illustrated in Fig. 3.2.2.
®
Fig. 3.2.2 : Interactions between different actors in a cloud
Now, understand that a cloud consumer can request cloud services directly from a
CSP or from a cloud broker. The cloud auditor independently audits and then contacts
other actors to gather information. We will now discuss the role of each actor in detail.
3.2.1 Cloud Consumer

A cloud consumer is the most important stakeholder. The cloud service is built to
support a cloud consumer. The cloud consumer uses the services from a CSP or person or
asks an organization that maintains a business relationship. The consumer then verifies
the service catalogue from the cloud provider and requests an appropriate service or sets
up service contracts for using the service. The cloud consumer is billed for the service
used.
Some typical usage scenarios include :
Example 1 : Cloud consumer requests the service from the broker instead of directly
contacting the CSP. The cloud broker can then create a new service by combining
multiple services or by enhancing an existing service. Here, the actual cloud provider is
not visible to the cloud consumer. The consumer only interacts with the broker. This is
illustrated in Fig. 3.2.3.
Fig. 3.2.3 : Cloud broker interacting with cloud consumer
Example 2 : In this scenario, the cloud carrier provides for connectivity and transports
cloud services to consumers. This is illustrated in Fig. 3.2.4.
®
Fig. 3.2.4 : Scenario for cloud carrier
In Fig. 3.2.4, the cloud provider participates by arranging two SLAs. One SLA is with
the cloud provider (SLA2) and the second SLA is with the consumer (SLA1). Here, the
cloud provider will have an arrangement (SLA) with the cloud carrier to have secured,
encrypted connections. This ensures that the services are available for the consumer at a
consistent level to fulfil service requests. Here, the provider can specify the requirements,
such as flexibility, capability and functionalities in SLA2 to fulfil essential service
requirements in SLA1.
Example 3 : In this usage scenario, the cloud auditor conducts independent evaluations
for a cloud service. The evaluations will relate to operations and security of cloud service
implementation. Here the cloud auditor interacts with both the cloud provider and
consumer, as shown in Fig. 3.2.5.
Fig. 3.2.5 : Usage scenario involving a cloud auditor
In all the given scenarios, the cloud consumer plays the most important role. Based on
the service request, the activities of other players and usage scenarios can differ for other
cloud consumers. Fig. 3.2.6 shows an example of available cloud services types.
In Fig. 3.2.6, note that SaaS applications are available over a network to all consumers.
These consumers may be organisations with access to software applications, end users,
app developers or administrators. Billing is based on the number of end users, the time of
use, network bandwidth consumed and for the amount or volume of data stored.
®
Fig. 3.2.6 : Example of cloud services available to cloud consumers
PaaS consumers can utilize tools, execution resources, development IDEs made
available by cloud providers. Using these resources, they can test, develop, manage,
deploy and configure many applications that are hosted on a cloud. PaaS consumers are
billed based on processing, database, storage, network resources consumed and for the
duration of the platform used.
On the other hand, IaaS consumers can access virtual computers, network - attached
storage, network components, processor resources and other computing resources that
are deployed and run arbitrary software. IaaS consumers are billed based on the amount
and duration of hardware resources consumed, number of IP addresses, volume of data
stored, network bandwidth, and CPU hours used for a certain duration.
®
3.2.2 Cloud Provider

Cloud provider is an entity that offers cloud services to interested parties. A cloud
provider manages the infrastructure needed for providing cloud services. The CSP also
runs the software to provide services and organizes the service delivery to cloud
consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfill cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed on
the infrastructure. On the other hand, SaaS consumers have no or limited administrative
controls.
PaaS cloud providers manage the computing infrastructure and ensure that the
platform runs the cloud software and implements databases, appropriate runtime
software execution stack and other required middleware elements. They support
development, deployment and the management of PaaS consumers by providing them
with necessary tools such as IDEs, SDKs and others. PaaS providers have complete
control of applications, settings of the hosting environment, but have lesser control over
the infrastructure lying under the platform, network, servers, OS and storage.
Now, the IaaS CSP aggregates physical cloud resources such as networks, servers,
storage and network hosting infrastructure. The provider operates the cloud software and
makes all compute resources available to IaaS cloud consumer via a set of service
interfaces, such as VMs and virtual network interfaces. The IaaS cloud provider will have
control over the physical hardware and cloud software to enable provisioning and
possible infrastructure services.
The main activities of a cloud provider can be viewed in Fig. 3.2.7.
Fig. 3.2.7 : Major activities of a cloud provider
®
Cloud Computing 3 - 10 Cloud Architecture, Services and Storage
The major activities of a cloud provider include :

 Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
 Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
 Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
 Security : Security, which is a critical function in cloud computing, spans all layers
in the reference architecture. Security must be enforced end-to-end. It has a wide
range from physical to application security. CSPs must take care of security.
 Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.
3.2.3 Cloud Auditor

The cloud auditor performs the task of independently evaluating cloud service
controls to provide an honest opinion when requested. Cloud audits are done to validate
standards conformance by reviewing the objective evidence. The auditor will examine
services provided by the cloud provider for its security controls, privacy, performance,
and so on.
3.2.4 Cloud Broker

The cloud broker collects service requests from cloud consumers and manages the use,
performance, and delivery of cloud services. The cloud broker will also negotiate and
manage the relationship between cloud providers and consumers. A cloud broker may
provide services that fall into one of the following categories :
 Service intermediation : Here the cloud broker will improve some specific
capabilities, and provide value added services to cloud consumers.
 Service aggregation : The cloud broker links and integrates different services into
one or more new services.
®
 Service Arbitrage : This is similar to aggregation, except for the fact that services
that are aggregated are not fixed. In service arbitrage, the broker has the liberty to
choose services from different agencies.
3.2.5 Cloud Carrier

The cloud carrier tries to establish connectivity and transports cloud services between
a cloud consumer and a cloud provider. Cloud carriers offer network access for
consumers, by providing telecommunication links for accessing resources using other
devices (laptops, computers, tablets, smartphones, etc.). Usually, a transport agent is an
entity offering telecommunication carriers to a business organization to access resources.
The cloud provider will set up SLAs with cloud carrier to ensure carrier transport is
consistent with the level of SLA provided by the consumers. Cloud carriers provide
secure and dedicated high - speed links with cloud providers and between different cloud
entities.
3.3 Cloud Deployment Models

A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. The NIST have classified cloud
deployment models into four categories namely,
 Public cloud
 Private cloud
 Hybrid cloud
 Community cloud
They describe the way in which users can access the cloud services. Each cloud
deployment model fits different organizational needs, so it's important that you pick a
model that will suit your organization's needs. The four deployment models are
characterized based on the functionality and accessibility of cloud services. The four
deployment models of cloud computing are shown in Fig. 3.3.1.
®
Fig. 3.3.1 : Four deployment models of cloud computing
3.3.1 Public Cloud

The public cloud services are runs over the internet. Therefore, the users who want
cloud services have to have internet connection in their local device like thin client, thick
client, mobile, laptop or desktop etc. The public cloud services are managed and
maintained by the Cloud Service Providers (CSPs) or the Cloud Service Brokers (CSBs).
The public cloud services are often offered on utility base pricing like subscription or pay-
per-use model. The public cloud services are provided through internet and APIs. This
model allows users to easily access the services without purchasing any specialize
hardware or software. Any device which has web browser and internet connectivity can
be a public cloud client. The popular public cloud service providers are Amazon web
services, Microsoft azure and Google app engine, Salesforce etc.
Advantages of public cloud

1. It saves capital cost behind purchasing the server hardware’s, operating systems
and application software licenses.
2. There is no need of server administrators to take care of servers as they are kept at
CSPs data center and managed by them.
3. No training is required to use or access the cloud services.
4. There is no upfront or setup cost is required.
5. A user gets easy access to multiple services under a single self - service portal.
6. Users have a choice to compare and select between the providers.
®
7. It is cheaper than in house cloud implementation because user have to pay for that
they have used.
8. The resources are easily scalable.
Disadvantages of public cloud

1. There is lack of data security as data is stored on public data center and managed by
third party data center vendors therefore there may be compromise of user’s
confidential data.
2. Expensive recovery of backup data.
3. User never comes to know where (at which location) their data gets stored, how
that can be recovered and how many replicas of data have been created.
3.3.2 Private Cloud

The private cloud services are used by the organizations internally. Most of the times it
run over the intranet connection. They are designed for a single organization therefore
anyone within the organization can get access to data, services and web applications
easily through local servers and local network but users outside the organizations cannot
access them. This type of cloud services are hosted on intranet therefore users who are
connected to that intranet get access to the services. The infrastructure for private cloud is
fully managed and maintained by the organization itself. It is much more secure than
public cloud as it gives freedom to local administrators to write their own security
policies for user’s access. It also provides good level trust and privacy to the users. Private
clouds are more expensive than public clouds due to the capital expenditure involved in
acquiring and maintaining them. The well-known private cloud platforms are Openstack,
Open nebula, Eucalyptus, VMware private cloud etc.
Advantages of private cloud

1. Speed of access is very high as services are provided through local servers over
local network.
2. It is more secure than public cloud as security of cloud services are handled by local
administrator.
3. It can be customized as per organizations need.
4. It does not require internet connection for access.
5. It is easy to manage than public cloud.
®
Disadvantages of private cloud
1. Implementation cost is very high as setup involves purchasing and installing

servers, Hypervisors, Operating systems.
2. It requires administrators for managing and maintaining servers.
3. The scope of scalability is very limited.
3.3.3 Hybrid Cloud

The hybrid cloud services are composed of two or more clouds that offers the benefits
of multiple deployment models. It mostly comprises on premise private cloud and off-
premise public cloud to leverage benefits of both and allow users inside and outside to
have access to it. The Hybrid cloud provides flexibility such that users can migrate their
applications and services from private cloud to public cloud and vice versa. It becomes
most favored in IT industry because of its eminent features like mobility, customized
security, high throughput, scalability, disaster recovery, easy backup and replication
across clouds, high availability and cost efficient etc. The popular hybrid clouds are AWS
with eucalyptus, AWS with VMware cloud, Google cloud with Nutanix etc.
The limitations of hybrid cloud are compatibility of deployment models, vendor-lock
in solutions, requires a common cloud management software and management of
separate cloud platforms etc.
3.3.4 Community Cloud

The community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause. The
community cloud is setup between multiple organizations whose objective is same. The
Infrastructure for community cloud is to be shared by several organizations within
specific community with common security, compliance objectives which is managed by
third party organizations or managed internally. The well-known community clouds are
Salesforce, Google community cloud etc.
3.3.5 Comparison between various Cloud Deployment Models

The comparison between different deployment models of cloud computing are given
in Table 3.3.1.
®
Sr. Public Community

Feature Private Cloud Hybrid Cloud
No Cloud Cloud
1 Scalability Very High Limited Very High Limited
2 Security Less Secure Most Secure Very Secure Less Secure
Low to
3 Performance Good Good Medium
Medium
Medium to
4 Reliability Medium High Medium
High
5 Upfront Cost Low Very High Medium Medium
Quality of
6 Low High Medium Medium
Service
Intranet and
7 Network Internet Intranet Internet
Internet
For general
For general Organizations public and For community
8 Availability
public internal staff organizations members
internal Staff
Openstack,
Windows Combination of
VMware cloud, Salesforce
9 Example Azure, AWS Openstack and
CloudStack, community
etc. AWS
Eucalyptus etc.
Table 3.3.1 : Comparison between various Cloud Deployment Models
3.4 Cloud Service Models

A Cloud computing is meant to provide variety of services and applications for users
over the internet or intranet. The most widespread services of cloud computing are
categorised into three service classes which are called cloud service models or cloud
reference models or working models of cloud computing. They are based on the
abstraction level of the offered capabilities and the service model of the CSPs. The various
service models are :
 Infrastructure as a Service (IaaS)
 Platform as a Service (PaaS)
 Software as a Service (SaaS)
The three service models of cloud computing and their functions are shown in
Fig. 3.4.1.
®
Fig. 3.4.1 : Cloud service models
From Fig. 3.4.1, we can see that the Infrastructure as a Service (IaaS) is the bottommost
layer in the model and Software as a Service (SaaS) lies at the top. The IaaS has lower
level of abstraction and visibility, while SaaS has highest level of visibility.
The Fig. 3.4.2 represents the cloud stack organization from physical infrastructure to
applications. In this layered architecture, the abstraction levels are seen where higher
layer services include the services of the underlying layer.
Fig. 3.4.2 : The cloud computing stack
As you can see in Fig. 3.4.2, the three services, IaaS, PaaS and SaaS, can exist
independent of one another or may combine with one another at some layers. Different
layers in every cloud computing model are either managed by the user or by the vendor
®
(provider). In case of the traditional IT model, all the layers or levels are managed by the
user because he or she is solely responsible for managing and hosting the applications. In
case of IaaS, the top five layers are managed by the user, while the four lower layers
(virtualisation, server hardware, storage and networking) are managed by vendors or
providers. So, here, the user will be accountable for managing the operating system via
applications and managing databases and security of applications. In case of PaaS, the
user needs to manage only the application and all the other layers of the cloud computing
stack are managed by the vendor. Lastly, SaaS abstracts the user from all the layers as all of
them are managed by the vendor and the former is responsible only for using the
application.
The core middleware manages the physical resources and the VMs are deployed on
top of them. This deployment will provide the features of pay-per-use services and multi-
tenancy. Infrastructure services support cloud development environments and provide
capabilities for application development and implementation. It provides different
libraries, models for programming, APIs, editors and so on to support application
development. When this deployment is ready for the cloud, they can be used by end-
users/ organisations. With this idea, let us further explore the different service models.
3.4.1 Infrastructure as a Service (IaaS)

Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for users. It is a
cloud service model that provides hardware resources virtualized in the cloud. It
provides virtual computing resources to the users through resource pool. In IaaS, the CSP
owns all equipment, such as servers, storage disks, and network infrastructure.
Developers use the IaaS service model to create virtual hardware on which the
applications and/ or services are developed. We can understand that an IaaS cloud
provider will create hardware utility service and make them available for users to
provision virtual resources as per need. Developers can create virtual private storage,
virtual private servers, and virtual private networks by using IaaS. The private virtual
systems contain software applications to complete the IaaS solution. The infrastructure of
IaaS consists of communication networks, physical compute nodes, storage solutions and
the pool of virtualized computing resources managed by a service provider. IaaS provides
users with a web-based service that can be used to create, destroy and manage virtual
machines and storage. It is a way of delivering cloud computing infrastructure like
Virtual servers, Virtual storage, Virtual network and Virtual operating systems as an on-
demand service. Instead of purchasing extra servers, softwares, datacenter space or
®
network equipment, IaaS enables on-demand provisioning of computational resources in

the form of virtual machines in cloud data center. Some key providers of IaaS are Amazon
Web Services (AWS), Microsoft Azure, GoGrid, Joyent, Rackspace etc. and some of the
private cloud softwares through which IaaS can be setup are Openstack, Apache Cloud
Stack, Eucalyptus, and VMware VSphere etc.
You must understand that the virtualised resources are
mapped to real systems in IaaS. This can be understood as
when a user with IaaS service makes a request for a service
from virtual systems, that request is redirected to the
physical server that does the actual work. The structure of
the IaaS model is shown in Fig. 3.4.3.
In IaaS service delivery, workload is the fundamental
component of the virtualised client. It simulates the
capacity of a physical server to perform work. Hence, the
work done is equal to the total number of Transaction Per
Minute (TPM). Note that the workload also has other
attributes, such as disk I/O (determined by I/O
per second), RAM used in MB, latency and network Fig. 3.4.3 : Components in
throughput and others. IaaS service model (cloud
In the case of hosted applications, the client runs on a security alliance)
dedicated server inside a server rack. It may also run on a
standalone server. In cloud computing, the provisioned server is known as an instance (or
server instance), which is reserved by a customer, along with adequate computing
resources required to fulfil their resource requirements. The user reserves an equivalent
machine required to run workloads.
The IaaS infrastructure runs the instances of the server in the data centre offering the
service. The resources for this server instance are drawn from a mix of virtualised
systems, RAID disks, network and interface capacity. These are physical systems
partitioned into logical smaller logical units.
The client in IaaS is allocated with its own private network. For example, Amazon EC2
enables this service to behave such that each server has its own separate network unless
the user creates a virtual private cloud. If the EC2 deployment is scaled by adding
additional networks on the infrastructure, it is easy to logically scale, but this can create
an overhead as traffic gets routed between logical networks.
®
In IaaS, the customer has controls over the OS, storage and installed applications, but
has limited control over network components. The user cannot control the underlying
cloud infrastructure. Services offered by IaaS include web servers, server hosting,
computer hardware, OS, virtual instances, load balancing, web servers and bandwidth
provisioning. These services are useful during volatile demands and when there is a
computing resource need for a new business launch or when the company may not want
to buy hardware or if the organisation wants to expand.
3.4.2 Platform as a Service

The Platform as a Service can be defined as a computing platform that allows the user
to create web applications quickly and easily and without worrying about buying and
maintaining the software and infrastructure. Platform-as-a-Service provides tools for
development, deployment and testing the softwares, middleware solutions, databases,
programming languages and APIs for developers to develop custom applications;
without installing or configuring the development environment. The PaaS provides a
platform to run web applications without installing them in a local machine i.e. the
applications written by the users can be directly run on the PaaS cloud. It is built on the
top of IaaS layer. The PaaS realizes many of the unique benefits like utility computing,
hardware virtualization, dynamic resource allocation, low investment costs and pre-
configured development environment. It has all the application typically required by the
client deployed on it. The challenge associated with PaaS is compatibility i.e. if user wants
to migrate the services from one provider to other then they have checked the
compatibility of execution engine and cloud APIs first. Some key providers of PaaS
clouds are Google App Engine, Microsoft Azure, NetSuite, Red hat Open shift etc.
The PaaS model includes the software environment
where the developer can create custom solutions using
development tools available with the PaaS platform. The
components of a PaaS platform are shown in Fig. 3.4.4.
Platforms can support specific development languages,
frameworks for applications and other constructs. Also,
PaaS provides tools and development environments to
design applications. Usually, a fully Integrated
Development Environment (IDE) is available as a PaaS Fig. 3.4.4 : Components of
service. For PaaS to be a cloud computing service, the PaaS
platform supports user interface development. It also has
many standards such as HTML, JavaScript, rich media and so on.
®
In this model, users interact with the software and append and retrieve data, perform
an action, obtain results from a process task and perform other actions allowed by the
PaaS vendor. In this service model, the customer does not own any responsibility to
maintain the hardware and software and the development environment. The applications
created are the only interactions between the customer and the PaaS platform. The PaaS
cloud provider owns responsibility for all the operational aspects, such as maintenance,
updates, management of resources and product lifecycle. A PaaS customer can control
services such as device integration, session management, content management, sandbox,
and so on. In addition to these services, customer controls are also possible in Universal
Description Discovery and Integration (UDDI), and platform independent Extensible
Mark-up Language (XML) registry that allows registration and identification of web
service apps.
Let us consider an example of Google app engine.
The platform allows developers to program apps using Google’s published APIs. In
this platform, Google defines the tools to be used within the development framework, the
file system structure and data stores. A similar PaaS offering is given by Force.com,
another vendor that is based on the Salesforce.com development platform for the latter’s
SaaS offerings.Force.com provides an add - on development environment.
In PaaS, note that developers can build an app with Python and Google API. Here, the
PaaS vendor is the developer who offers a complete solution to the user. For instance,
Google acts as a PaaS vendor and offers web service apps to users. Other examples are :
Google Earth, Google Maps, Gmail, etc.
PaaS has a few disadvantages. It locks the developer and the PaaS platform in a
solution specific to a platform vendor. For example, an application developed in Python
using Google API on Google App Engine might work only in that environment.
PaaS is also useful in the following situations :
 When the application must be portable.
 When proprietary programming languages are used.
 When there is a need for custom hardware and software.
Major PaaS applications include software development projects where developers and
users collaborate to develop applications and automate testing services.
3.4.2.1 Power of PaaS
PaaS offers promising services and continues to offer a growing list of benefits. The
following are some standard features that come with a PaaS solution :
®
 Source code development : PaaS solutions provide the users with a wide range of
language choices including stalwarts such as Java, Perl, PHP, Python and Ruby.
 Websites : PaaS solutions provide environments for creating, running and
debugging complete websites, including user interfaces, databases, privacy and
security tools. In addition, foundational tools are also available to help developers
update and deliver new web applications to meet the fast-changing needs and
requirements of their user communities.
 Developer sandboxes : PaaS also provides dedicated “sandbox” areas for
developers to check how snippets of a code perform prior to a more formal test.
Sandboxes help the developers to refine their code quickly and provide an area
where other programmers can view a project, offer additional ideas and suggest
changes or fixes to bugs.
The advantages of PaaS go beyond relieving the overheads of managing servers,
operating systems, and development frameworks. PaaS resources can be provisioned and
scaled quickly, within days or even minutes. This is because the organisation does not
have host any infrastructure on premises. In fact, PaaS also may help organisations reduce
costs with its multitenancy model of cloud computing allowing multiple entities to share
the same IT resources. Interestingly, the costs are predictable because the fees are pre-
negotiating every month.
The following boosting features can empower a developer’s productivity, if efficiently
implemented on a PaaS site :
 Fast deployment : For organisations whose developers are geographically scattered,
seamless access and fast deployment are important.
 Integrated Development Environment (IDE) : PaaS must provide the developers
with Internet - based development environment based on a variety of languages,
such as Java, Python, Perl, Ruby etc., for scripting, testing and debugging their
applications.
 Database : Developers must be provided with access to data and databases. PaaS
must provision services such as accessing, modifying and deleting data.
 Identity management : Some mechanism for authentication management must be
provided by PaaS. Each user must have a certain set of permissions with the
administrator having the right to grant or revoke permissions.
 Integration : Leading PaaS vendors, such as Amazon, Google App Engine, or
Force.com provide integration with external or web-based databased and services.
This is important to ensure compatibility.
®
 Logs : PaaS must provide APIs to open and close log files, write and examine log
entries and send alerts for certain events. This is a basic requirement of application
developers irrespective of their projects.
 Caching : This feature can greatly boost application performance. PaaS must make
available a tool for developers to send a resource to cache and to flush the cache.
3.4.2.2 Complications with PaaS
PaaS can significantly affect an application’s performance, availability and flexibility.

However, there are critical issues to consider. The following are some of the
complications or issues of using PaaS :
Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may not be
compatible with another vendor and hence not necessarily migrate easily to it.
Although most of the times customers agree with being hooked up to a single vendor,
this may not be the situation every time. Users may want to keep their options open. In
this situation, developers can opt for open - source solutions. Open - source PaaS
provides elasticity by revealing the underlying code and the ability to install the PaaS
solution on any infrastructure. The disadvantage of using an open source version of PaaS
is that certain benefits of an integrated platform are lost.
Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to make
sure that the vendor you choose supports the same technologies. For example, if you are
strongly dedicated to a .NET architecture, then you must select a vendor with native .NET
support. Likewise, database support is critical to performance and minimising
complexity.
Vulnerability and security : Multitenancy lets users to be spread over interconnected
hosts. The providers must take adequate security measures in order to protect these
vulnerable hosts from attacks, so that an attacker is not able to easily access the resources
of host and also tenant objects.
Providers have the ability to access and modify user objects/systems. The following
are the three ways by which security of an object can be breached in PaaS systems :
 A provider may access any user object that resides on its hosts. This type of attack is
inevitable but can be avoided to some extent by trusted relations between the user
and the provider.
®
 Co-tenants, who share the same resources, may mutually attack each other’s objects.
 Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
 Cryptographic methods namely, symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of the
providers to protect the integrity and privacy of user objects on a host.
Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related solutions
are being built to tackle this problem of vendor lock-in. Most customers are unaware of
the terms and conditions of the providers that prevent interoperability and portability of
applications. A number of strategies are proposed on how to avoid/lessen lock-in risks
before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is unable
to migrate its applications or data to a different vendor. This heterogeneity of cloud
semantics creates technical incompatibility, which in turn leads to interoperability and
portability challenges. This makes interoperation, collaboration, portability and
manageability of data and services a very complex task.
3.4.3 Software as a Service

Software-as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users. It gives remote access to softwares that resides on cloud
server not on the user’s device. Therefore, user does not need to install required softwares
in their local device as they are provided remotely through network. The consumer of a
SaaS application only requires thin client software such as a web browser to access the
cloud-hosted application. This reduces the hardware requirements for end-users and
allows for centralized control, deployment and maintenance of the software.
SaaS provides a model for complete infrastructure. It is viewed as a complete cloud
model where hardware, software and the solution, all are provided as a complete service.
You can denote SaaS as software deployed on the cloud or on a hosted service accessed
through a browser, from anywhere over the internet. The user accesses the software, but
all the other aspects of the service are abstracted away from the user. Some examples of
popular SaaS applications are Google Docs, Hotmail, Salesforce and Gmail. The structure
of the SaaS system is illustrated in Fig. 3.4.5.
®
SaaS provides the capability to use applications

supplied by the service provider but does not follow
control of platform or the infrastructure. Most of the users
are familiar with SaaS systems because they offer a
substitute for local software. Examples are : Google
Calendar, Zoho Office Suite, Google Gmail.
SaaS applications come in a variety of applications to
include custom software such as CRM applications, Fig. 3.4.5 : Structure of SaaS
Helpdesk applications, HR applications, billing and

invoicing applications and so on. SaaS applications may not be fully customisable, but
there are many applications that provide APIs for developers to create customised
applications.
The APIs allow modifications to the security model, data schema, workflow
characteristics and other functionalities of services as experienced by the user. Few
examples of SaaS platform enabled by APIs include Salesforce.com, Quicken.com and
others. SaaS apps are delivered by CSPs. This further implies that the user does not have a
hand in infrastructure management or individual app capabilities. Rather the SaaS apps
can be accessed over a thin client web interface. SaaS provides the following services :
 Enterprise - level services
 Web 2.0 applications including social networking, blogs, wiki servers, portal
services, metadata management and so on.
Some of the common characteristics found in SaaS applications are as follows :
 Applications deployed on SaaS are available over the internet and can be accessed
from any location.
 Software can be licensed based on subscriptions or billed based on usage, usually
on a recurring basis.
 The vendor monitors and maintains the software and the service.
 SaaS applications are cheaper because they reduce the cost of distribution and
maintenance. End - user costs are also reduced significantly.
 SaaS enables faster rollout, as features such as automatic rollouts, upgrades, patch
management and other tasks are easier to implement from a centralised system.
 SaaS applications can scale up or scale down based on demand and they have lower
barrier entry compared to their locally installed competitors.
 All SaaS users can have the same version of the software, and hence the issue of
®
compatibility is eliminated.
 SaaS has the capacity to support multiple users.
In spite of the above benefits, there are some drawbacks of SaaS. For example, SaaS is
not suited for applications that need real - time response where there is a requirement for
data to be hosted externally.
3.5 Architectural Design Challenges

The cloud architecture design plays an important role in making cloud services
successful in all aspects, but still it has some challenges. The major challenges involved in
architectural design of cloud computing are shown in Fig. 3.5.1 and explained as follows.
Fig. 3.5.1 : Architectural design challenges in cloud
3.5.1 Challenges related to Data Privacy, Compliance and Security Concerns

Presently, most of the cloud offerings are basically runs on public networks which
renders the infrastructure more susceptible to attack. The most common attacks on the
network include buffer overflows, DoS attacks, spyware, malware, root kits, trojan horses
and worms. With well-known technologies such as encrypted data, virtual LANs and
network middleboxes such as firewalls, packet filters etc., many challenges can be solved
immediately. Newer attacks may result from hypervisor malware, guest hopping and
hijacking or VM rootkits in a cloud environment. Another form of attack on VM
migrations is the man-in-the-middle attack. The passive attacks typically steal personal
data or passwords while active attacks can exploit data structures in the kernel that will
cause significant damage to cloud servers.
®
To protect from cloud attacks, one could encrypt their data before placing it in a cloud.
In many countries, there are laws that allow SaaS providers to keep consumer data and
copyrighted material within national boundaries that also called as compliance or
regulatory standards. Many countries still do not have laws for compliance; therefore, it is
indeed required to check the cloud service providers SLA for executing compliance for
services.
3.5.2 Challenges related to Unpredictable Performance and Bottlenecks

In cloud computing, the cloud platform is responsible for deploying and running
services on the top of resource pool which has shared hardware from different physical
servers. In a production environment, multiple Virtual Machines (VMs) shares the
resources with each other like CPU, memory, I/O and network. Whenever I/O devices
are shared between VMs, it may generate a big challenge during provisioning due to I/O
interfaced between them. It may generate an unpredicted performance and may result
into system bottlenecks. The problem becomes wider when such I/O resources are pulled
across boundaries of cloud. In such scenarios, the accessibility may become complicated
for data placement and transport. To overcome that, data transfer bottlenecks must be
removed, bottleneck links must be widened and weak servers in cloud infrastructure
should be removed. One solution for this challenge is to improve I / O architectures and
operating systems used in physical servers, so that interrupts and I / O channels can be
easily virtualized.
3.5.3 Challenges related to Service Availability and Vendor/Data Lock-in

Due to popularity of cloud computing, many organizations run their mission critical or
business critical applications on cloud with shared infrastructure provided by cloud
service providers. Therefore, any compromise in service availability may result into huge
financial loss. Therefore, managing a single enterprise cloud service is often leads to
single failure points. The solution related to this challenge is use of multiple cloud
providers. In such case, even if a company has multiple data centers located in different
geographic regions, it may have common software infrastructure and accounting systems.
Therefore, using multiple cloud providers may provide more protection from failures.
In such instances, even if an organization has several data centers located in various
geographic regions the multiple cloud service providers can protect their cloud
infrastructure and accounting systems and make them available continuously. The use of
multiple cloud providers will also provide more protection against failures. Such
implementation may ensure the high availability for the organizations. Distributed Denial
®
of Service (DDoS) attacks are another obstacle to availability. Criminals are trying to slash
SaaS providers' profits by making their services out of control. Some utility computing
services give SaaS providers the ability to use quick scale - ups to protect themselves
against DDoS attacks.
In some cases, due the failure of a single company who was providing cloud storages
the lock - in concern arises. As well as because of some vendor - lock in solutions of cloud
services providers, organizations face difficulties in migrating to new cloud service
provider. Therefor to mitigate those challenges related to data lock in and vendor lock in,
software stacks can be used to enhance interoperability between various cloud platforms
as well as standardize APIs to rescue data loss due to a single company failure. It also
supports "surge computing" that has the same technological framework in both public
and private clouds and is used to catch additional tasks that cannot be performed
efficiently in a private cloud's data center.
3.5.4 Challenges related to Cloud Scalability, Interoperability and

Standardization
In cloud computing, pay-as-you-go model refers to utility - based model where bill for
storage and the bandwidth of the network are calculated according to the number of
bytes used. Depending on the degree of virtualization, computation is different. Google
App Engine scales and decreases automatically in response to load increases; users are
paid according to the cycles used. Amazon Web Service charges the number of instances
used for VM by the hour, even though the computer is idle. The potential here is to scale
up and down quickly in response to load variability, to save money, but without
breaching SLAs. In virtualization, the Open Virtualization Format (OVF) defines an open,
secure, portable, effective and extensible format for VM packaging and delivery. It also
specifies a format to be used to distribute the program in VMs. It also specifies a
transportation framework for VM templates, which can refer to various virtualization
platforms with different virtualization levels.
The use of a different host platform, virtualization platform or guest operating system
does not depend on this VM format. The solution is to address virtual platform - agnostic
packaging with bundled device certification and credibility. The package provides
support for virtual appliances that span more than one VM. The ability of virtual
appliances needs to be proposed to operate on any virtual platform in terms of cloud
standardization to allow VMs to run hypervisors on heterogeneous hardware platforms.
The cloud platform should also introduce live cross - platform migration between x86
®
Intel and AMD technologies and support legacy load balancing hardware to avoid the
challenges related to interoperability.
3.5.5 Challenges related to Software Licensing and Reputation Sharing

Most of the cloud computing providers primarily depended on open source software,
as the commercial software licensing model is not suitable for utility computing. The key
opportunity is either to stay popular with open source, or simply to encourage
commercial software companies to adjust their licensing structure to suit cloud
computing better. One may consider using both pay-for-use and bulk licensing schemes
to broaden the scope of the company. Bad conduct by one client can affect the credibility
of the cloud as a whole. For example, In AWS, spam - prevention services can restrict
smooth VM installation by blacklisting of EC2 IP addresses. An advantage would be to
build reputation - guarding services similar to those currently provided through "trusted
e-mail" providers for providers hosted on smaller ISPs. Another legal issue concerns the
transfer of legal responsibility. Cloud services require consumers to remain legally
accountable and vice versa. This problem needs to be solved at SLA level.
3.5.6 Challenges related to Distributed Storage and Bugs in Softwares

In cloud applications the database services continuously grow. The potential is to
build a storage infrastructure that not only fulfills this growth but also blends it with the
cloud benefit of scaling up and down dynamically on demand. That involves the design
of efficiently distributed SANs. The data centers will meet the standards of programmers
in terms of scalability, system reliability and HA. A major problem in cloud computing is
data consistency testing in SAN - connected data centers. Large - scale distributed bugs
cannot be replicated, so debugging must take place on a scale in the data centers for
production. Hardly any data center will deliver that convenience. One solution may be to
focus on using VMs in cloud computing. The virtualization level can allow valuable
information to be captured in ways that are impossible without using VMs. Debugging
on simulators is another way to fix the problem, if the simulator is well designed.
3.6 Cloud Storage

With the rise in the popularity of cloud computing, you may be wondering where and
how the data is stored in the cloud. The model in which the digital data is stored in logical
pools is a cloud storage. Your data is stored in an online repository. So, it is the
responsibility of the storage service provider to take care of the data files. Take an
example of the email service you are using, like Gmail, Yahoo etc. The emails you send or
®
receive are not stored on your local hard disks but are kept on the email providers’ server.
It is important to note that none of the data is stored on your local hard drives.
It is true that all computer owners store data. For these users, finding enough storage
space to hold all the data they have accumulated seems like impossible mission. Earlier,
people stored information in the computer’s hard drive or other local storage devices, but
today, this data is saved in a remote database. The Internet provides the connection
between the computer and the database. Fig. 3.6.1 illustrates how cloud storage works.
Fig. 3.6.1 : The working of cloud storage
People may store their data on large hard drives or other external storage devices like
thumb drives or compact discs. But with cloud, the data is stored in a remote database.
Fig. 3.6.1 consists of a client computer, which has a bulk of data to be stored and the
control node, a third-party service provider, which controls several databases together.
Cloud storage system has storage servers. The subscriber copies their files to the storage
servers over the internet, which will then record the data. If the client needs to retrieve the
data, the client accesses the data server with a web - based interface, and the server either
sends the files back to the client or allows the client to access and manipulate the data
itself.
Cloud storage is a service model in which data is maintained, managed and backed up
remotely and made available to users over a network. Cloud storage provides extremely
efficient storage of objects that scales to exabytes of data. It allows to access data from any
storage class instantly, integrate storage with a single unified API into your applications
and optimize the performance with ease. It is the responsibility of cloud storage providers
®
to keep the data available and accessible and to secure and run the physical environment.
Even though data is stored and accessed remotely, you can maintain data both locally and
on the cloud as a measure of safety and redundancy.
The cloud storage system requires one data server to be connected to the internet. The
copies of files are sent by the client to that data server, which saves the information. The
server sends the files back to the client. Through the web - based interface, the server
allows the client to access and change the files on the server itself, whenever he or she
wants to retrieve it. The connection between the computer and database is provided by
the internet. Cloud storage services, however, use tens or hundreds of data servers. Since
servers need maintenance or repair, it is important to store stored data on several
machines, providing redundancy. Without redundancy, cloud storage services could not
guarantee clients that they would be able to access their information at any given time.
There are two techniques used for storing the data on cloud called cloud sync and cloud
backup which are explained as follows.
3.6.1 Difference between Cloud Sync and Cloud Backup

 Cloud sync : Cloud sync stores the same set of most up-to-date version of files and
folders on client devices and cloud storage. When you modify the data, sync
uploads those updated files, which can be manually downloaded by the user. This
is one-way sync. In two - way sync, the intermediate storage is a cloud. Cloud sync
is suitable for the organisations or people who use multiple devices regularly. Some
cloud sync services are Dropbox, iCloud Drive, OneDrive, Box and Google Drive.
These services match up organisers on your PC to folders on different machines or
to the cloud – enabling clients to work from a folder or directory from anywhere.
 Cloud backup : Sending a copy of the data over a public network to an off - site
server is called cloud backup and is handled by a third - party service provider. Some
cloud backup services are IBackup, Carbonite, Back Blaze, etc. These services work
out of sight naturally. The client does not have to make any move, such as setting up
folders. Backup services commonly go down any new or changed information on
your PC to another area.
3.7 Storage as a Service

Storage as a service comes across as a good substitute for a small or medium scale
organisations who are not efficient enough have their own storage infrastructure, have
budget constraints and lacks technical personnel for storage implementation. It is an
outsource model which allows third party providers (organizations) to rent space on their
®
storage to end users, who lacks a budget or a capital budget to pay for it on their own.
End users store their data on rented storage space at remote location on cloud. The
storage as a service providers rent their storage space to the organizations on a cost-per-
gigabyte stored or cost-per-data-transfer basis. The end user doesn't have to pay for the
infrastructure; they only pay for how much they're transferring and saving data on the
servers of the provider.
The storage as a service is a good alternative for small or mid - size businesses that
lacks the capital budget to implement and maintain their own storage infrastructure. The
key providers of storage as a service are Amazon S3, Google Cloud Storage, Rackspace,
Dell EMC, Hewlett Packard Enterprise (HPE), NetApp and IBM etc. It is also being
promoted as a way for all companies to mitigate their risks in disaster recovery, provide
long-term retention of records and enhance both business continuity and availability. The
small - scale enterprises find it very difficult and costly to buy dedicated storage
hardware for data storage and backup. This issue is addressed by storage as a service,
which is a business model that help the small companies in renting storage from large
companies who have wider storage infrastructure. It is also suitable if the technical staff
are not available or have insufficient experience to implement and manage the storage
infrastructure.
Individuals as well as small companies can use storage as a service to save cost and
manage backups. They can save cost in hardware, personnel and physical space. Storage
as a service is also called as hosted storage. Storage Service Provider (SSP) are those
companies which are providing storage as a service. SaaS vendors promotes SaaS as a
suitable way of managing backups in the enterprise. They target the secondary storage
applications. It also helps in mitigating the effect of disaster recovery.
Storage providers are responsible for storing data of their customers using this model.
The storage provider provides the software required for the client to access their stored
data on cloud from anywhere and at any time. Customers use that software to perform
standard storage related activities, including data transfers and backups. Since storage as
a service vendors agree to meet SLAs, businesses can be assured that storage can scale
and perform as required. It can facilitate direct connections to both public and private
cloud storage.
In most instances, organizations use storage as a service that opt public cloud for
storage and backup purpose instead of keeping data on premises. The methods provided
by storage as a service include backup and restore, disasters recovery, block storage, SSD
storage, object storage and transmission of bulk data. The backup and restore refers to
data backup to the cloud which provides protection and recovery when data loss occurs.
®
Disaster recovery may refer to protecting and replicating data from Virtual Machines
(VMs) in case of disaster. Block storage allows customers to provision block storage
volumes for lower - latency I/O. SSD storage is another type of storage generally used for
data intensive read/write and I/O operations. Object storage systems are used in in data
analytics, disaster recovery and cloud applications. Cold storage is used for quick creation
and configuration of stored data. Bulk data transfers can use disks and other equipment
for bulk data transmission.
There are many cloud storage providers available on the internet, but some of the
popular storage as a service providers are listed as follows :
 Google drive - The google provides Google Drive as a storage service for every
Gmail user who can store up to 15 GB of data free of cost which can scale up to ten
terabytes. It allows to use Google Docs embedded with google account to upload
documents, spreadsheets and presentations to Google’s data servers.
 Microsoft one drive - Microsoft provides One drive with 5 GB free storage space
which is scalable to 5 TB for storing users’ files. It is embedded with Microsoft 365
and Outlook mails. It allows to synchronize files between the cloud and a local
folder along with providing a client software for any platform to store and access
files from multiple devices. It allows to backed-up files with ransomware protection
as well as allowing to recover previous saved versions of files or data from the
cloud.
 Drop box - Dropbox is a file hosting service, that offers cloud storage, file
synchronization, personal cloud and client software services. It can be installed and
run on any OS platform. It provides free storage space of 2 GB which can scale up to
5 TB.
 MediaMax and Strongspace - They offer rented storage space for any kind of
digital data to be stored on cloud servers.
3.7.1 Advantages of Storage as a Service

The key advantages of storage as a service are given as follows
 Cost - Storage as a service reduces much of the expense of conventional backup
methods, by offering ample cloud storage space at a small monthly charge.
 Invisibility - Storage as a service is invisible, as no physical presence can be seen in
its deployment, and therefore does not take up valuable office space.
 Security - In this type of service, data is encrypted both during transmission and
during rest, ensuring no unauthorized access to files by the user.
®
 Automation - Storage as a service makes the time - consuming process of backup

easier to accomplish through automation. Users can simply select what and when
they want to backup and the service does the rest of it.
 Accessibility - By using storage as a service, users can access data from
smartphones, netbooks to desktops and so on.
 Syncing - Syncing in storage as a service ensures that your files are updated
automatically across all of your devices. This way, the latest version of a user file
stored on their desktop is available on your smartphone.
 Sharing - Online storage services make it easy for users to share their data with just
a few clicks.
 Collaboration - Cloud storage services are also ideal for collaborative purposes.
They allow multiple people to edit and collaborate in a single file or document. So,
with this feature, users don't need to worry about tracking the latest version or who
made any changes.
 Data protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters such as floods, earthquakes and human error.
 Disaster recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.
3.7.2 Disadvantages of Storage as a Service

The disadvantages of storage as a service are given as follows
 Potential downtimes : Due to failure in cloud, vendors may go through periods of
downtime where the service is not available, which may be a major issue for
mission - critical data.
 Limited customization : As the cloud infrastructure is owned and managed by the
service provider, it is less customizable.
 Vendor lock-in : Due to potential for vendor lock-in, it may be difficult to migrate
from one service provider to another.
 Unreliable - In some cases, there is still a possibility that the system could crash and
leave consumers with no means of accessing their stored data. The small service
provider becomes unreliable in that case. Therefore, when a cloud storage system is
unreliable, it becomes a liability. No one wants to save data on an unstable platform
or trust a organization that is unstable. Most cloud storage providers seek to resolve
the issue of reliability through redundancy.
®
3.8 Advantages of Cloud Storage

In today’s scenario, cloud storage is an extremely important and valuable tool for all
kind of businesses. Therefore, it is necessary to understand the benefits and risks
associated with cloud storage. We will now discuss some benefits and risks of the cloud
technology.
The following are the benefits of cloud storage :
 Accessibility : With the internet, clients can access the information from anyplace
and at any time using devices such as smartphones, laptops, tablets, etc. This
reduces the stress of transferring files. Also, files remain same across all the devices.
The cloud storage gives you freedom to access to your files from anywhere, at any
time and on any device through an internet connection.
 Greater collaboration : Without wasting time, cloud storage enables you to transfer
or share files or folders in a simple and a quick way. It removes the pain of sending a
lot of emails to share files. This helps save your time and provides better
collaboration. Also, all the changes are automatically saved and shared with the
collaborators.
 Security : Security is a major concern when it comes to your confidential data.
Cloud storage is secure, with various encryption techniques that prevent
unauthorised access. Cloud storage providers complement their services with
additional security layers. Since there are many users with files stored in the cloud,
these services go to great lengths to ensure that the files are not accessed by anyone
who are not authorized for.
 Cost - efficient : Cloud storage, which is an online repository, eliminates the cost of
hard drives or any other external devices like compact disks. Organisations do not
need to spend extra money on additional expensive servers. There is plenty of space
in online storage. The physical storage's can be expensive than cloud storage as
cloud storage provides remarkably cheaper per GB pricing without the need for
hardware storage than using external drives.
 Instant data recovery : You can access your files in the cloud and recover them in
case of a hard drive failure or some other hardware malfunction. It serves as a
backup solution for your physical drives data stored locally. Cloud storage allows
easy recovery of your original files and restores them with minimal downtime.
 Syncing and updating : When you deal with cloud storage, any time you make
changes to a file from which you access the cloud will be synchronized and
modified across all of your devices.
®
 Disaster recovery : Companies are highly advised to have an emergency response

plan ready in case of an emergency. Enterprises may use cloud storage as a back -
up service by offering a second copy of critical files. Such files are stored remotely,
and can be accessed through an internet connection.
3.8.1 Risks in Cloud Storage

The following are the risks in cloud storage :
 Dependency : It is also known as “vendor-lock-in”. The term alludes to the
difficulties in moving from one cloud specialist organisation to other. This is because
of the movement of information. Since administrations keep running over a remote
virtual condition, the client is furnished with restricted access over the product and
equipment, which gives rise to concerns about control.
 Unintended permanence : There have been scenarios when cloud users complain
that specific pictures have been erased in the current ‘iCloud hack’. In this way, the
specialist organisations are in full commitment that the client’s information ought
not be damaged or lost. Consequently, clients are urged to make full utilisation of
cloud backup offices. Subsequently, the duplicates of documents might be recovered
from the servers, regardless of the possibility that the client loses its records.
 Insecure interfaces and APIs : To manage and interact with cloud services, various
interfaces and APIs are used by customers. Two categories of web - based APIs are
SOAP (based on web services) and REST (based on HTTP). These APIs are easy
targets for man-in-the-middle or replay attacks. Therefore, secure authentication,
encryption and access control must be used to provide protection against these
malicious attacks.
 Compliance risks : It is a risk for organisations that have earned certifications to
either meet industry standards or to gain the competitive edge when migrating to
clouds. This is a risk when cloud provider does not follow their own compliance
requirements or when the cloud provider does not allow the audit by the cloud
customer.
3.8.2 Disadvantages of Cloud Storage

 Privacy concerns : In cloud storage, the data no longer exists on your physical disks
as it stores on cloud platform run by cloud service providers. In many cases, the
storage solutions are outsourced by cloud providers to other firms, in such cases the
privacy concerns may arises due to intervention of third - party providers.
 Dependency on internet connection : The data file can only be moved to a cloud
®
server when your internet connection remains working. When your internet
connection faces technical problems or stops functioning, you will face difficulties
in transmitting the data to or recovering from remote server.
 Compliance problems : Many cloud service providers are prone to weaker
compliance as many countries restrict cloud service providers to expose their users
data across country’s geographic boundaries and if they do so, they may get
penalized or may leads to closure of IT operations of specific cloud service provider
in that country that may leads to huge data loss. Therefore, one should never
purchase cloud storage from an unknown source or third parties and always decide
to buy from well - established companies. It might not be possible to operate within
the public cloud depending on the degree of regulation within your industry. This
is particularly the case for healthcare, financial services and publicly traded
enterprises that need to be very cautious when considering this option.
 Vulnerability to attacks : The vulnerability to external hack attacks is present with
your business information stored in the cloud. The internet is not entirely secure,
and for this reason, sensitive data can still be stealthy.
 Data management : Managing cloud data can be a challenge because cloud storage
systems have their own structures. Your business current storage management
system may not always fit well with the system offered by the cloud provider.
 Data protection concerns : Cloud protection and privacy : There are issues about
the remote storage of sensitive and essential data. Before adopting cloud
technologies, you should be aware that you are providing a third - party cloud
service provider with confidential business details and that could potentially harm
your firm. That's why it's crucial to choose a trustworthy service provider you trust
to keep your information protected.
3.9 Cloud Storage Providers

The cloud storage provider, also known as the Managed Service Provider (MSP), is a
company that provides organizations and individuals with the ability to place and retain
data in an off - site storage system. Customers can lease cloud storage capacity per month
or on request. Cloud storage provider hosts customer data in its own data center,
providing cost - based computing, networking and storage infrastructure. Individual and
corporate customers can have unlimited storage capacity on the provider's servers at a
low per - gigabyte price. Instead of storing data on local storage devices, such as a hard
disk drive, flash storage or tape, customers choose a cloud storage provider to host data
®
on a remote data center system. Users can then access these files via an internet
connection. The cloud storage provider also sells non - storage services at a fee.
Enterprises purchase computing, software, storage and related IT components as discreet
cloud services with a pay-as-you-go license. Customers may choose to lease infrastructure
as a service; platform as a service; or security, software and storage as a service. The level
and type of services chosen are set out in a service level agreement signed with the
provider. The ability to streamline costs by using the cloud can be particularly beneficial
for small and medium - sized organizations with limited budgets and IT staff. The main
advantages of using a cloud storage provider are cost control, elasticity and self - service.
Users can scale computing resources on demand as needed and then discard those
resources after the task has been completed. This removes any concerns about exceeding
storage limitations with on - site networked storage. Some of popular cloud storage
providers are Amazon Web Services, Google, Microsoft, Nirvanics and so on. The
description about popular cloud storage providers are given as follows :
 Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services
interface that can be used to store and retrieve any amount of data from anywhere
on the cloud at any time. It gives every developer access to the same highly scalable
data storage infrastructure that Amazon uses to operate its own global website
network. The goal of the service is to optimize the benefits of scale and to pass those
benefits on to the developers.
 Google Bigtable datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
The size of the Bigtable database can be petabytes, spanning thousands of
distributed servers. Bigtable is now open to developers as part of the Google app
engine, their cloud computing platform.
 Microsoft live mesh : Windows live mesh was a free-to-use internet - based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses live framework APIs to share any
data item between devices that recognize the data.
®
 Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage - based pricing. It supports Cloud - based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup or unstructured archives that need long -
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built - in disaster data recovery and
automatic data replication feature for up to three geographically distributed storage
nodes.
3.10 Simple Storage Service (S3)

Amazon S3 offers a simple web services interface that can be used to store and retrieve
any amount of data from anywhere, at any time on the web. It gives any developer access
to the same scalable, secure, fast, low - cost data storage infrastructure that Amazon uses
to operate its own global website network. S3 is an online backup and storage system. The
high - speed data transfer feature known as AWS Import/Export will exchange data to
and from AWS using Amazon’s own internal network to another portable device.
Amazon S3 is a cloud - based storage system that allows storage of data objects in the
range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Here it is
important to note that the concept of a file system is not associated with S3 because file
systems are not supported, only objects are stored. In addition to this, the user is not
required to mount a bucket, as opposed to a file system. Fig. 3.10.1 shows an S3
diagrammatically.
Fig. 3.10.1 : AWS S3
S3 system allows buckets to be named (Fig. 3.10.2), but the name must be unique in the
S3 namespace across all consumers of AWS. The bucket can be accessed through the S3
web API (with SOAP or REST), which is similar to a normal disk storage system.
®
Fig. 3.10.2 : Source bucket
The performance of S3 is limited for use with non-operational functions such as data
archiving, retrieval and disk backup. The REST API is more preferred to SOAP API
because it is easy to work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For
example, S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
 Create new, modify or delete existing buckets.
 Upload or download new objects to a bucket.
 Search and identify objects in buckets.
 Identify metadata associated with objects and buckets.
 Specify where the bucket is stored.
 Provide public access to buckets and objects.
The S3 service can be used by many users as a backup component in a 3-2-1 backup
method. This implies that your original data is 1, a copy of your data is 2 and an off-site
copy of data is 3. In this method, S3 is the 3rd level of backup. In addition to this, Amazon
S3 provides the feature of versioning.
®
In versioning, every version of the object stored in an S3 bucket is retained, but for this,
the user must enable the versioning feature. Any HTTP or REST operation, namely PUT,
POST, COPY or DELETE will create a new object that is stored along with the older
version. A GET operation retrieves the new version of the object, but the ability to recover
and undo actions are also available. Versioning is a useful method for reserving and
archiving data.
3.10.1 Amazon Glacier

Amazon glacier is very low - price online file storage web service which offer secure,
flexible and durable storage for online data backup and archiving. This web service is
specially designed for those data which are not accessed frequently. That data which is
allowed to be retrieved within three to five hours can use amazon glacier service.
You can virtually store any type of data, any format of data and any amount of data
using amazon glacier. The file in ZIP and TAR format are the most common type of data
stored in amazon glacier.
Some of the common use of amazon glacier are :
 Replacing the traditional tape solutions with backup and archive which can last
longer.
 Storing data which is used for the purposes of compliance.
3.10.2 Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 3.10.1 shows
the comparison of amazon glacier and amazon S3 :
Amazon Glacier Amazon S3
It supports 40 TB archives It supports 5 TB objects
It is recognised by archive IDs which are It can use “friendly” key names
system generated
It encrypts the archives automatically It is optional to encrypt the data automatically
It is extremely low - cost storage Its cost is much higher than Amazon Glacier
Table 3.10.1 : Amazon Glacier Vs Amazon S3
You can also use amazon S3 interface for availing the offerings of amazon glacier with
no need of learning a new interface. This can be done by utilising Glacier as S3 storage
class along with object lifecycle policies.
®
Summary
 The cloud architecture design is the important aspect while designing a cloud.
Every cloud platform is intended to provide four essential design goals like
scalability, reliability, efficiency, and virtualization. To achieve this goal, certain
requirements has to be considered.
 The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. These three levels of architecture are
implemented with virtualization and standardization of cloud-provided
hardware and software resources.
 The NIST cloud computing reference architecture is designed with taking help of
IT vendors, developers of standards, industries and other governmental
agencies, and industries at a global level to support effective cloud computing
security standards and their further development.
 A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. There are four
deployment models are characterized based on the functionality and
accessibility of cloud services namely Public, Private, Hybrid and community.
 The public cloud services are runs over the internet. Therefore, the users who
want cloud services have to have internet connection in their local device,
private cloud services are used by the organizations internally and most of the
times it run over the intranet connection, Hybrid cloud services are composed of
two or more clouds that offers the benefits of multiple deployment models while
community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause.
 The most widespread services of cloud computing are categorised into three
service classes which are also called Cloud service models namely IaaS, PaaS
and SaaS.
 Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for
users, Platform as a Service can be defined as a computing platform that allows
the user to create web applications quickly and easily and without worrying
about buying and maintaining the software and infrastructure while Software-
as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users.
 There are six challenges related to cloud architectural design related to data
privacy, security, compliance, performance, interoperability, standardization,
service availability, licensing, data storage and bugs.
®
 Cloud storage is a service model in which data is maintained, managed and

backed up remotely and made available to users over a network. Cloud Storage
provides extremely efficient storage of objects that scales to exabytes of data.
 The Storage as a Service is an outsource model which allows third party
providers (organizations) to rent space on their storage to end users, who lacks a
budget or a capital budget to pay for it on their own.
 The cloud storage provider, also known as the Managed Service Provider (MSP),
is a company that provides organizations and individuals with the ability to
place and retain data in an off-site storage system.
 Amazon S3 offers a simple web services interface that can be used to store and
retrieve any amount of data from anywhere, at any time on the web. It gives any
developer access to the same scalable, secure, fast, low-cost data storage
infrastructure that Amazon uses to operate its own global website network.
Q.1 Bring out differences between private cloud and public cloud. AU : Dec.-16
Ans. : The differences between private cloud and public cloud are given in Table 3.1.
Sr. No Feature Public Cloud Private Cloud
1 Scalability Very High Limited
2 Security Less Secure Most Secure
3 Performance Low to Medium Good
4 Reliability Medium High
5 Upfront Cost Low Very High
6 Quality of Service Low High
7 Network Internet Intranet
8 Availability For General Public Organizations Internal Staff
Openstack, VMware
9 Example Windows Azure, AWS etc. Cloud, CloudStack,
Eucalyptus etc.
Table 3.1 : Comparison between various cloud deployment models
Q.2 Why do we need hybrid cloud ? AU : Dec.-16
®
Ans. : The hybrid cloud services are composed of two or more clouds that offers the
benefits of multiple deployment models. It mostly comprises on premise private cloud
and off - premise public cloud to leverage benefits of both and allow users inside and
outside to have access to it. The hybrid cloud provides flexibility such that users can
migrate their applications and services from private cloud to public cloud and vice
versa. It becomes most favored in IT industry because of its eminent features like
mobility, customized security, high throughput, scalability, disaster recovery, easy
backup and replication across clouds, high availability and cost efficient etc. The other
benefits of hybrid cloud are
 Easily - accessibility between private cloud and public cloud with plan for disaster
recovery.
 We can take a decision about what needs to be shared on public network and what
needs to be kept private.
 Get unmatched scalability as per demand.
 Easy to control and manage public and private cloud resources.
Q.3 Write a short note on community cloud. AU : Dec.-18

Q.4 Summarize the differences between PaaS and SaaS. AU : May-17

Ans. : The differences between PaaS and SaaS are given as follows.
Platform as a Service (PaaS) Software as a Service (SaaS)

It is used for providing a platform to develop, It is used for on demand software or
deploy, test or run web applications quickly application delivery over the internet or
and easily without worrying about buying and intranet.
maintaining the software and infrastructure.
It is used for web hosting. It is used for software or application hosting.
It provides tools for development, deployment It provides hosted software stack to the users
and testing the softwares along with from which they can get access to particular
middleware solutions, databases, and APIs for software at any time over the network.
developers.
It is used by developers. It is used by end users.
The abstraction in PaaS is moderate. The abstraction in SaaS is very high.
It has significantly lower degree of control than It has higher degree of control than PaaS.
SaaS.
Risk of vendor-interlocking is medium. Risk of vendor-interlocking is very high.
®
Operational cost is lower than IaaS. Operational cost is very minimal than IaaS
and PaaS.
It has lower portability than IaaS. It doesn’t provide portability.
Examples : AWS Elastic Beanstalk, Windows Examples : Google Apps, Dropbox,
Azure, Heroku, Force.com, Google App Salesforce, Cisco WebEx, Concur,
Engine, Apache Stratos, OpenShift GoToMeeting
Q.5 Who are the major players in the cloud ? AU : May-19

Ans. : There are many major players who provides cloud services, some of them with
their services supported are given in Table 3.2.
Sr. No. Name of Supported services Deployment
Cloud service model
provider
1) Amazon Web Infrastructure as a Services using EC2, Platform Public cloud
Service (AWS) as a service using elastic beanstalk, Database as a
service using RDB, Storage as a service using S3,
Network as a service using pureport, Containers
as a service using amazon elastic container
service, Serverless computing using lambda etc.
2) Openstack Infrastructure as a services using Nova, Platform Private cloud
as a service using Solum, Database as a service
using Trove, Network as a service using
Neutron, Big data as a service using Sahara etc.
3) Google cloud Infrastructure as a services using google compute Public cloud
platform engine, Platform as a service using google app
engine, Software as a service using google docs,
Gmail and google suit, Database as a service
using Cloud SQL, Containers as a service using
Kubernetes, Serverless computing using
functions as a service, Big data as a service using
Big Query, Storage as a service google cloud
storage, etc.
4) Microsoft azure Infrastructure as a services using azure virtual Public cloud
machines, Platform as a service using azure app
services, Database as a service using azure SQL,
Storage as a service using azure Blob storage,
Containers as a service using azure Kubernetes
service, Serverless computing using azure
functions etc.
®
5) Salesforce Software as a service Public cloud

6) Oracle Cloud Infrastructure as a services using Oracle Cloud Public cloud
Infra OCI, Platform as a service using Oracle
application container, Storage as a service using
Oracle Cloud Storage OCI, Containers as a
service using Oracle Kubernetes service,
Serverless computing using Oracle cloud Fn etc.
7) Heroku Cloud Platform as a service Public cloud
Q.6 What are the basic requirements for cloud architecture design ?
Ans. : The basic requirements for cloud architecture design are given as follows :
 The cloud architecture design must provide automated delivery of cloud services
along with automated management.
 It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
 It must support very large - scale HPC infrastructure with both physical and virtual
machines.
 The architecture of cloud must be loosely coupled.
 It should provide easy access to cloud services through a self-service web portal.
 Cloud management software must be efficient to receive the user request, finds the
correct resources, and then calls the provisioning services which invoke the resources
in the cloud.
 It must provide enhanced security for shared access to the resources from data
centers.
 It must use cluster architecture for getting the system scalability.
 The cloud architecture design must be reliable and flexible.
 It must provide efficient performance and faster speed of access.
Q.7 What are different layers in layered cloud architecture design ?

Ans. : The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. The infrastructure layer consists of virtualized
services for computing, storage, and networking. It is responsible for provisioning
infrastructure components like Compute (CPU and memory), Storage, Network and IO
resources to run virtual machines or virtual servers along with virtual storages. The
platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to
®
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. A collection of all
software modules required for SaaS applications forms the application layer. This layer
is mainly responsible for making on demand application delivery. In this layer,
software applications include day-to-day office management softwares used for
information collection, document processing, calendar and authentication. Enterprises
also use the application layer extensively in business marketing, sales, Customer
Relationship Management (CRM), financial transactions, and Supply Chain
Management (SCM).
Q.8 What are different roles of cloud providers ?
Ans. : Cloud provider is an entity that offers cloud services to interested parties. A
cloud provider manages the infrastructure needed for providing cloud services. The
CSP also runs the software to provide services, and organizes the service delivery to
cloud consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfil cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed
on the infrastructure. On the other hand, SaaS consumers have no or limited
administrative controls.
The major activities of a cloud provider include :
 Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
 Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure, and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
 Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
 Security : Security, which is a critical function in cloud computing, spans all layers in
the reference architecture. Security must be enforced end-to-end. It has a wide range
from physical to application security. CSPs must take care of security.
®
Fig. 3.1 : Major activities of a cloud provider
 Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication, and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.
Q.9 What are different complications in PaaS ?
Ans. : The following are some of the complications or issues of using PaaS :
 Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may
not be compatible with another vendor, and hence not necessarily migrate easily to
it.
Although most of the times customers agree with being hooked up to a single
vendor, this may not be the situation every time. Users may want to keep their
options open. In this situation, developers can opt for open-source solutions. Open-
source PaaS provides elasticity by revealing the underlying code, and the ability to
install the PaaS solution on any infrastructure. The disadvantage of using an open
source version of PaaS is that certain benefits of an integrated platform are lost.
 Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to
make sure that the vendor you choose supports the same technologies. For example,
if you are strongly dedicated to a .NET architecture, then you must select a vendor
with native .NET support. Likewise, database support is critical to performance and
minimising complexity.
®
 Vulnerability and Security : Multitenancy lets users to be spread over interconnected

hosts. The providers must take adequate security measures in order to protect these
vulnerable hosts from attacks, so that an attacker is not able to easily access the
resources of host and also tenant objects.
 Providers have the ability to access and modify user objects/systems. The following
are the three ways by which security of an object can be breached in PaaS systems :
o A provider may access any user object that resides on its hosts. This type of attack
is inevitable but can be avoided to some extent by trusted relations between the
user and the provider.
o Co-tenants, who share the same resources, may mutually attack each other’s
objects.
o Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
o Cryptographic methods, namely symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of
the providers to protect the integrity and privacy of user objects on a host.
 Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related
solutions are being built to tackle this problem of vendor lock-in. Most customers are
unaware of the terms and conditions of the providers that prevent interoperability
and portability of applications. A number of strategies are proposed on how to
avoid/lessen lock-in risks before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is
unable to migrate its applications or data to a different vendor. This heterogeneity of
cloud semantics creates technical incompatibility, which in turn leads to
interoperability and portability challenges. This makes interoperation, collaboration,
portability and manageability of data and services a very complex task.
Q.10 Enlist the pros and cons of storage as a service.
Ans. : The key advantages or pros of storage as a service are given as follows :
 Cost - Storage as a service reduces much of the expense of conventional backup

methods, by offering ample cloud storage space at a small monthly charge.
 Invisibility - Storage as a service is invisible, as no physical presence can be seen in
its deployment, and therefore does not take up valuable office space.
 Security - In this type of service, data is encrypted both during transmission and
during rest, ensuring no unauthorized access to files by the user.
®
 Automation - Storage as a service makes the time-consuming process of backup

easier to accomplish through automation. Users can simply select what and when
they want to backup, and the service does the rest of it.
 Accessibility - By using storage as a service, users can access data from smartphones,
netbooks to desktops, and so on.
 Syncing - Syncing in storage as a service ensures that your files are updated
automatically across all of your devices. This way, the latest version of a user file
stored on their desktop is available on your smartphone.
 Sharing - Online storage services make it easy for users to share their data with just a
few clicks.
 Collaboration - Cloud storage services are also ideal for collaborative purposes. They
allow multiple people to edit and collaborate in a single file or document. So, with
this feature, users don't need to worry about tracking the latest version or who made
any changes.
 Data Protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters, such as floods, earthquakes and human error.
 Disaster Recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.
The disadvantages or cons of storage as a service are given as follows

 Potential downtimes : Due to failure in cloud, vendors may go through periods of
downtime where the service is not available, which may be a major issue for mission-
critical data.
 Limited customization : As the cloud infrastructure is owned and managed by the
service provider, it is less customizable.
 Vendor lock-in : Due to Potential for vendor lock-in, it may be difficult to migrate
from one service provider to another.
 Unreliable : In some cases, there is still a possibility that the system could crash and
leave consumers with no means of accessing their stored data. The small service
provider becomes unreliable in that case. Therefore, when a cloud storage system is
unreliable, it becomes a liability. No one wants to save data on an unstable platform
or trust an organization that is unstable. Most cloud storage providers seek to resolve
the issue of reliability through redundancy.
Q.11 What are different risks in cloud storages ?
®
Ans. : The following are the risks in cloud storage :
 Dependency : It is also known as “vendor-lock-in”. The term alludes to the

difficulties in moving from one cloud specialist organisation to other. This is because
of the movement of information. Since administrations keep running over a remote
virtual condition, the client is furnished with restricted access over the product and
equipment, which gives rise to concerns about control.
 Unintended Permanence : There have been scenarios when cloud users complain
that specific pictures have been erased in the current ‘iCloud hack’. In this way, the
specialist organisations are in full commitment that the client’s information ought not
be damaged or lost. Consequently, clients are urged to make full utilisation of cloud
backup offices. Subsequently, the duplicates of documents might be recovered from
the servers, regardless of the possibility that the client loses its records.
 Insecure Interfaces and APIs : To manage and interact with cloud services, various
interfaces and APIs are used by customers. Two categories of web-based APIs are
SOAP (based on web services) and REST (based on HTTP). These APIs are easy
targets for man-in-the-middle or replay attacks. Therefore, secure authentication,
encryption and access control must be used to provide protection against these
malicious attacks.
 Compliance Risks : It is a risk for organisations that have earned certifications to
either meet industry standards or to gain the competitive edge when migrating to
clouds. This is a risk when cloud provider does not follow their own compliance
requirements or when the cloud provider does not allow the audit by the cloud
customer.
Q.12 Enlist the different cloud storage providers.
Ans. : The description about popular cloud storage providers are given as follows :
 Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services

interface that can be used to store and retrieve any amount of data from anywhere on
the cloud at any time. It gives every developer access to the same highly scalable data
storage infrastructure that Amazon uses to operate its own global website network.
The goal of the service is to optimize the benefits of scale and to pass those benefits
on to the developers.
 Google Bigtable Datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
®
The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
 Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
 Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
Q.13 What is Amazon S3 ?
Ans. : Amazon S3 is a cloud-based storage system that allows storage of data objects in
the range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Amazon S3
offers a simple web services interface that can be used to store and retrieve any amount
of data from anywhere, at any time on the web. It gives any developer access to the
same scalable, secure, fast, low-cost data storage infrastructure that Amazon uses to
operate its own global website network.
Q.1 With architecture, elaborate the various deployment models and reference
models of cloud computing. AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models.
Q.2 Describe service and deployment models of cloud computing environment
with illustration. How do they fit in NIST cloud architecture ? AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models and section 3.2 for NIST cloud architecture.
®
Q.3 List the cloud deployment models and give a detailed note about them.
AU : Dec.-16
Ans. : Refer section 3.3 for cloud deployment models.
Q.4 Give the importance of cloud computing and elaborate the different types of
services offered by it. AU : Dec.-16
Ans. : Refer section 3.4 for cloud service models.
Q.5 What are pros and cons for public, private and hybrid cloud ? AU : Dec.-18
Ans. : Refer section 3.3 for pros and cons of public, private and hybrid cloud and
section 3.3.5 for their comparison.
Q.6 Describe Infrastructure as a Service (IaaS), Platform-as-a-Service (PaaS) and
Software-as-a-Service (SaaS) with example. AU : Dec.-18
Ans. : Refer section 3.4 for cloud service models for description of Infrastructure as a
Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
Q.7 Illustrate the cloud delivery models in detail. AU : Dec.-19
Ans. : Refer section 3.4 for cloud delivery models.
Q.8 Compare and contrast cloud deployment models. AU : Dec.-19

Ans. : Refer section 3.3 for cloud deployment models and 3.3.5 for comparison between
cloud deployment models.
Q.9 Describe the different working models of cloud computing. AU : May-19
Ans. : Refer sections 3.3 and 3.4 for working models of cloud computing which are
deployment models and service models.
Q.10 Write a detailed note on layered cloud architecture design.
Q.11 Explain in brief NIST cloud computing reference architecture.

Q.12 Enlist and contrast architectural design challenges of cloud computing.

Q.13 Explain in detail cloud storage along with its pros and cons.
Ans. : Refer section 3.6 for cloud storage and 3.8 for pros and cons of cloud storage.
Q.14 Write a detailed note on storage-as-a-service

Q.15 Explain in brief significance of Amazon S3 in cloud computing.


®
4 Resource Management and
Security in Cloud
Syllabus
Inter Cloud Resource Management - Resource Provisioning and Resource Provisioning
Methods - Global Exchange of Cloud Resources - Security Overview - Cloud Security
Challenges - Software-as-a-Service Security - Security Governance - Virtual Machine Security -
IAM - Security Standards.
Contents
4.1 Inter Cloud Resource Management
4.2 Resource Provisioning and Resource Provisioning Methods
4.3 Global Exchange of Cloud Resources
4.4 Security Overview
4.5 Cloud Security Challenges
4.6 Software-as-a-Service Security
4.7 Security Governance
4.8 Virtual Machine Security
4.9 IAM
4.10 Security Standards
(4 - 1)
Cloud Computing 4-2 Resource Management and Security in Cloud
4.1 Inter Cloud Resource Management

Resource management is a process for the allocation of computing, storage,
networking and subsequently energy resources to a set of applications, in a context that
aims to collectively meet the performance goals of infrastructure providers, cloud users
and applications. The cloud users prefer to concentrate on application performance while
the conceptual framework offers a high-level view of the functional aspect of cloud
resource management systems and all their interactions. Cloud resource management is a
challenge due to the scale of modern data centers, the heterogeneity of resource types, the
interdependence between such resources, the variability and unpredictability of loads,
and the variety of objectives of the different players in the cloud ecosystem.
Whenever any service is deployed on cloud, it uses resources aggregated in a common
resource pool which are collected from different federated physical servers. Sometimes,
cloud service brokers may deploy cloud services on shared servers for their customers
which lie on different cloud platforms. In that situation, the interconnection between
different servers needs to be maintained. Sometimes, there may be a loss of control if any
particular cloud server faces downtime which may generate huge business loss.
Therefore, it’s quite important to look at inter cloud resource management to address the
limitations related to resource provisioning.
We have already seen the NIST architecture for cloud computing which has three
layers namely infrastructure, platform and application.
These three layers are referred by three services like Infrastructure as a service,
Platform as a service and Software as a service respectively. The Infrastructure as a
service is the foundation layer which provides compute, storage and network services to
other two layers like platform as a service and software as a service. Even as the three
basic services are different in use, they are built on top of each other. In practical there are
five layers required to run cloud applications. The functional layers of cloud computing
services are shown in Fig. 4.1.1.
®
Fig. 4.1.1 Functional layers of Cloud computing
 The consequence is that one cannot directly launch SaaS applications on a cloud
platform. The cloud platform for SaaS cannot be built unless there are compute,
storage and network infrastructure are established.
 In above architecture, the lower three layers are more closely connected to physical
specifications.
 The Hardware as a Service (HaaS) is the lowermost layer which provides various
hardware resources to run cloud services.
 The next layer is Infrastructure as a Service that interconnects all hardware elements
using computer, storage and network services.
 The next layer has two services namely Network as a Service (NaaS) to bind and
provisioned cloud services over the network and Location as a Service (LaaS) to
provide collocation service to control, and protect all physical hardware and
network resources.
 The next layer is Platform as a Service for web application deployment and delivery
while topmost layer is actually used for on demand application delivery.
In any cloud platform, the cloud infrastructure performance is the primary concern for
every cloud service provider while quality of services, service delivery and security are
the concerns for cloud users. Every SaaS application is subdivided into the different
application areas for business applications like CRM is used for sales, promotion, and
marketing services. CRM offered the first SaaS on the cloud successfully. The other tools
may provide distributed collaboration, financial management or human resources
management.
In inter cloud resource provisioning, developers have to consider how to design the
system to meet critical requirements such as high throughput, HA, and fault tolerance.
The infrastructure for operating cloud computing services may be either a physical server
®
or a virtual server. By using VMs, the platform can be flexible, i.e. running services are
not associated with specific hardware platforms. This adds flexibility to cloud computing
platforms. The software layer at the top of the platform is a layer for storing huge
amounts of data.
Like in the cluster environment, there are some runtime support services accessible in
the cloud computing environment. Cluster monitoring is used to obtain the running state
of the cluster as a whole. The scheduler queues the tasks submitted to the entire cluster
and assigns tasks to the processing nodes according to the availability of the node. The
runtime support system helps to keep the cloud cluster working with high efficiency.
Runtime support is the software needed for browser-initiated applications used by
thousands of cloud customers. The SaaS model offers software solutions as a service,
rather than requiring users to buy software. As a result, there is no initial investment in
servers or software licenses on the customer side. On the provider side, the cost is rather
low compared to the conventional hosting of user applications. Customer data is stored in
a cloud that is either private or publicly hosted by PaaS and IaaS providers.
4.2 Resource Provisioning and Resource Provisioning Methods

The rise of cloud computing indicates major improvements in the design of software
and hardware. Cloud architecture imposes further focus on the amount of VM instances
or CPU cores. Parallelism is being used at the cluster node level. This section broadly
focuses on the concept of resource provisioning and its methods.
4.2.1 Provisioning of Compute Resources

Cloud service providers offer cloud services by signing SLAs with end-users. The
SLAs must commit appropriate resources, such as CPU, memory, and bandwidth that the
user can use for a preset time. The lack of services and under provisioning of resources
would contribute to violation of the SLAs and penalties. The over provisioning of
resources can contribute to under-use of services and, as a consequence, to a decrease in
revenue for the supplier. The design of an automated system to provision resources and
services effectively is a difficult task. The difficulties arise from the unpredictability of
consumer demand, heterogeneity of services, software and hardware failures, power
management and disputes in SLAs signed between customers and service providers.
Cloud architecture and management of cloud infrastructure rely on effective VM
provisioning. Resource provisioning schemes are also used for the rapid discovery of
cloud computing services and data in cloud. The virtualized cluster of servers involve
efficient VM deployment, live VM migration, and fast failure recovery. To deploy VMs,
®
users use virtual machines as a physical host with customized operating systems for
different applications.
For example, Amazon’s EC2 uses Xen as the Virtual Machine Monitor (VMM) which is
also used in IBM’s Blue Cloud. Some VM templates are also supplied on the EC2
platform. From templates, users can select different types of VMs. But no VM templates
are provided by IBM 's Blue Cloud. Any form of VMs may generally be run on the top of
Xen. In its Azure cloud platform, Microsoft also applied virtualization. A resource-
economic services provider should deliver. The increase in energy waste by heat
dissipation from data centers means that power-efficient caching, query processing and
heat management schemes are necessary. Public or private clouds promise to streamline
software, hardware and data as a service, provisioned in order to save on-demand IT
deployment and achieving economies of scale in IT operations.
4.2.2 Provisioning of Storage Resources

As cloud storage systems also offer resources to customers, it is likely that data is
stored in the clusters of the cloud provider. The data storage layer in layered architecture
lies at the top of a physical or virtual server. The provisioning of storage resources in
cloud is often associated with the terms like distributed file system, storage technologies
and databases.
Several cloud computing providers have developed large scale data storage services to
store a vast volume of data collected every day. A distributed file system is very essential
for storing large data, as traditional file systems have failed to do that. For cloud
computing, it is also important to construct databases such as large-scale systems based
on data storage or distributed file systems. Some examples of distributed file system are
Google’s GFS that stores huge amount of data generated on web including images, text
files, PDFs or spatial data for Google Earth. The Hadoop Distributed File System (HDFS)
developed by Apache is another framework used for distributed data storage from the
open source community. Hadoop is an open-source implementation of Google's cloud
computing technology. The Windows Azure Cosmos File System also uses the distributed
file system. Since the storage service or distributed file system can be accessed directly,
similar to conventional databases, cloud computing does have a form of structure or
semi-structured database processing capabilities. However, there are also other forms of
data storage. In cloud computing, another type of data storage is (Key, Value) pair or
object-based storage. Amazon DynamoDB uses (Key, Value) pair to store a data in a
NOSQL database while Amazon S3 uses SOAP to navigate objects stored in the cloud.
®
In storage, numerous technologies are available like SCSI, SATA, SSDs, and Flash
storages and so on. In future, hard disk drives with solid-state drives may be used as an
enhancement in storage technologies. It would ensure reliable and high-performance data
storage. The key obstacles to the adoption of flash memory in data centers have been
price, capacity and, to some extent, lack of specialized query processing techniques.
However, this is about to change as the I/O bandwidth of the solid-state drives is
becoming too impressive to overlook.
Databases are very popular for many applications as they used as an underlying storage
container. The size of such a database can be very high for the processing of huge
quantities of data. The main aim is to store data in structured or semi-structured forms so
that application developers can use it easily and construct their applications quickly.
Traditional databases may meet the performance bottleneck while the system is being
extended to a larger scale. However, some real applications do not need such a strong
consistency. The size of these databases can be very growing. Typical cloud databases
include Google’s Big Table, Amazons Simple DB or DynamoDB and Azure SQL service
from Microsoft Azure.
4.2.3 Provisioning in Dynamic Resource Deployment

The cloud computing utilizes virtual machines as basic building blocks to construct
the execution environment across multiple resource sites. Resource provisioning in
dynamic environment can be carried out to achieve scalability of performance. The Inter-
Grid is a Java-implemented programming model that allows users to build cloud-based
execution environments on top of all active grid resources. The peering structures
established between gateways enable the resource allocation from multiple grids to
establish the execution environment. The Intergrid Gateway (IGG) allocate resources
from the local cluster to deploy applications in three stages, which include requesting
virtual machines, authorizing leases and deploying virtual machines as demanded. At
peak demand, this IGG interacts at another IGG that is capable of sharing resources from
a cloud storage provider. The grid has pre-configured peering relationships with other
grids that are controlled by the IGG. The system manages the use of Intergrid resources
across several IGGs. The IGG is aware of peering parameters with other grids that selects
appropriate grids that can provide the necessary resources, and responds to requests
from other IGGs. The Request redirect policies decide which peering grid Intergrid wants
to process the request and the rate at which that grid can perform the task. The IGG can
even allocate resources from a cloud service provider. The cloud system provides a
virtual environment that lets users to deploy their applications as like Intergrid, such
®
technologies use the tools of the distributed grid. The Intergrid assigns and manages a
Distributed Virtual Environment (DVE). It is a cluster of available vms isolated from
other virtual clusters. The DVE Manager component performs resource allocation and
management on behalf of particular user applications. The central component of the IGG
is the schedule for enforcing provisioning policies and peering with several other
gateways. The communication system provides an asynchronous message-passing
mechanism that is managed in parallel by a thread pool.
4.2.4 Methods of Resource Provisioning

There are three cases in the static cloud resource provisioning scheme, namely over-
provisioning of resources at peak load, under provisioning of resources that results in
losses for both the user and the providers because of wastage and shortage of resources
below the allocated capacity and constant provisioning and Constant provision of
resources with fixed capacity for declining user demand could result in to even worse
waste of resources. In such cases, both the user and the provider may lose in the
provisioning of resources with no elasticity.
 There are three resource-provisioning methods which are presented in the
following sections.
 The demand-driven method offers static resources and has been used for many
years in grid computing.
 The event-driven method is based on the expected time-dependent workload.
 The popularity-driven method is based on the monitoring of Internet traffic. We
define these methods of resource provisioning as follows.
4.2.4.1 Demand-Driven Resource Provisioning
In demand driven resource provisioning, the resources are allocated as per demand by
the users in dynamic environment. This method adds or eliminates computing instances
depending on the current level of usage of allocated resources. The demand-driven
method automatically allocates two the CPUs to the user application when the user uses
one CPU more than 60 percent of the time for an extended period. In general, when a
resource has met the threshold for a certain amount of time, the system increases the
resource on the basis of demand. If the resource is utilized below the threshold for a
certain amount of time, that resource could be reduced accordingly. This method is
implemented by Amazon web services called as auto-scale feature that runs on its EC2
®
server. This method is very easy to implement. This approach does not work successfully
if the workload changes abruptly.
4.2.4.2 Event-Driven Resource Provisioning
In event driven resource provisioning, the resources are allocated whenever an event
generated by the users for at a specific time of interval in dynamic environment. This
method adds or removes machine instances that are based on a specific time event. This
approach works better for seasonal or predicted events when additional resources are
required for shorter time of interval. During these events, the number of users increases
before and decreases after the event period. Decreases over the course of the incident.
This scheme estimates peak traffic before the event happens. This method results in a
small loss of QoS if the occurrence is correctly predicted. Otherwise, its wasted resources
are even larger due to events that do not follow a fixed pattern.
4.2.4.3 Popularity-Driven Resource Provisioning
In popularity driven resource provisioning, the resources are allocated based on

popularity of certain applications and their demands. In this method, the internet checks
for popularity of certain applications and produces instances by popularity demand. In
this method, the Internet seeks the popularity and creates instances by popularity
demand of certain applications. The scheme expects increased in traffic with popularity.
Again, if the predicted popularity is correct, the scheme has a minimum loss of QoS. If
traffic does not happen as expected, resources may get wasted.
4.3 Global Exchange of Cloud Resources

To serve a large number of users worldwide, the IaaS cloud providers have set up
datacenters in various geographical locations to provide redundancy and ensure
reliability in the event of site failure. However, Amazon is currently asking its cloud
customers (i.e. SaaS providers) to give preference to where they want their application
services to be hosted. Amazon does not have seamless/automatic frameworks for scaling
hosted services across many geographically dispersed data centers. There are many
weaknesses in this approach. First, cloud customers cannot find the best place for their
services in advance because they do not know the origin of their services' consumers.
Secondly, SaaS providers may not be able to meet QoS requirements from multiple
geographical locations of their service consumers. It involves the development of
structures that help complex applications across multiple domains to efficiently federate
®
cloud data centers to meet cloud customers' QoS targets. Moreover, not a single provider
of cloud infrastructure will be able to set up its data centers, anywhere around the world.
This will make it difficult to meet the QoS standards for all its customers by cloud
applications service (SaaS) providers. They also want to take advantage of the resources
of multiple providers that can best serve their unique needs in cloud infrastructure. In
companies with global businesses and applications such as Internet services, media
hosting and Web 2.0 applications, this form of requirement often arises. This includes the
federation of providers of cloud infrastructure to offer services to multiple cloud
providers. To accomplish it, Intercloud architecture has been proposed to enable
brokerage and the sharing of cloud resources for applications across multiple clouds in
order to scale applications. The generalized Intercloud architecture is shown in Fig. 4.3.1.
Fig. 4.3.1 Intercloud Architecture
The cloud providers can expand or redimension their provision capacity in a

competitive and dynamic manner by leasing the computation and storage resources of
other cloud service providers with the use of Intercloud architectural principles. It helps
operators, such as Salesforce.com to host services based in an SLA contract that is agreed,
to operate in a market-driven resource leased federation. It offers reliable, on demand,
affordable and QoS-aware services using virtualization technology and ensures high QoS
quality and reduces cost of operation. They must be able to employ market-based utility
®
Cloud Computing 4 - 10 Resource Management and Security in Cloud
models as the assumption to offer heterogeneous user applications to virtualize software

services and federated hardware infrastructures.
The intercloud architecture consolidates the distributed storage and computing
capabilities of clouds in a single resource-leasing abstraction. They comprise client
brokerage and coordination services which support the utility based useful cloud
federation: scheduling of applications, allocation of resources and workload migration.
The system will facilitate the integration of cross-domain capability for on demand,
adaptable, energized and reliable infrastructure access based on virtualization
technology. The Cloud Exchange (CEX) is used to enhance and analyze the infrastructure
demands of application brokers against the available supply. It acts as a marketing
authority to bring service producers and consumers together to encourage cloud service
trading on the basis of competitive economic models such as commodity prices and
auctions. The SLA (Service Level Agreement) specifies the service details to be provided
in accordance with agreed metrics, incentives and penalties for meeting and breaching
expectations. The accessibility of a bank system within the market ensures that SLAs
between participants are transacted in a safe and reliable environment.
4.4 Security Overview

The cloud computing is made as an on-demand service through the network to
provision resources, applications and information. It includes a very high computational
power and storage capacity. Nowadays most small and medium size companies (SMEs)
move to the cloud because of their advantages such as lower infrastructure, no
maintenance costs, model payoff, scalability, load balancing, independent venue, on-
demand access, quicker deployment and flexibility, etc.
Although cloud computing has many benefits in most of the aspects, but security
issues in cloud platforms led many companies to hesitate to migrate their essential
resources to the cloud. In this new environment, companies and individuals often worry
about how security, privacy, trust, confidentiality and integrity of compliance can be
maintained. However, the companies that jump to the cloud computing can be even more
worrying about the implications of placing critical applications and data in the cloud. The
migration of critical applications and sensitive data to public and shared across multiple
cloud environments is a major concern for companies that move beyond the network
perimeter defense of their own data center. To resolve these concerns, a cloud software
provider needs to ensure that customers continue to maintain the same security and
privacy controls on their services and applications, provide customers with evidence that
their company and consumers are secure, and can fulfill their service-level agreements,
®
and can demonstrate their auditors' compliance. Lack of trust between service providers
and cloud users has prevented cloud computing from being generally accepted as a
solution for on demand service.
Trust and privacy are also more challenging for web and cloud services as most
desktop and server users have resisted leaving user applications to their cloud provider’s
data center. Some users worry about the lack of privacy, security and copyright
protection on cloud platforms. Trust is not a mere technological question, but a social
problem. However, with a technical approach, the social problem can be solved. The
cloud uses virtual environment that poses new security threats that are harder to manage
than traditional configurations of clients and servers. Therefore, a new data protection
model is needed to solve these problems.
Three basic enforcement of cloud security is expected. First, data center security
facilities require year-round on-site security. There is frequently deployment of biometric
readers, CCTV (Close Circuit), motion detection and man traps are required. The global
firewalls, Intrusion Detection Systems (IDSes) and third-party vulnerability assessments
often required for meeting fault-tolerant network security. Finally, security platform must
acquire SSL transmissions, data encryption, strict password policies and the certification
of the system's trust. As Cloud servers can be either physical or Virtual machines.
Security compliance requires a security-aware cloud architecture that should provide
remedy for malware-based attacks such as worms, viruses and DDoS exploit system
vulnerabilities. These attacks compromise the functionality of the system or provide
unauthorized access to critical information for intruders.
4.4.1 Cloud Infrastructure Security

The cloud computing is made for provision of resources, applications and information
as an on-demand service over the network. It comprises very high computational power
and storage capabilities. Nowadays most of the small and medium size companies are
migrating towards cloud because of its benefits like reduced hardware, no maintenance
cost, pay-as-you go model, scalability, load balancing, location independent access, on-
demand security controls, fast deployment and flexibility etc.
But still many of the organizations are still confident to move in to cloud because of
security concerns. There are many security issues arises in cloud computing which needs
to be resolved on priority basis. As we have seen in chapter 3, cloud computing has three
service models called SPI model in which infrastructure is the core of all the service
models. The infrastructure as a service comprises servers, storage, network, virtual
machines and virtual operating systems on which other services are deployed. So, there is
®
a need to protect infrastructure first. The infrastructure security is the important factor in
cloud security. The cloud composed of network of connected servers called host with
applications deployed on them.
The infrastructure security has three levels security model which is composed of Network
level security, Host level security and Application level security. The three models of
infrastructure security are explained as follows.
4.4.2 Network Level Security

The network level security is related to vulnerabilities in the public and private
network. At network level it is important to distinguish public and private clouds. In
private cloud the attacks, vulnerabilities and risk specific to network topology are known
prior and information security personnel need to consider those only.
In public cloud the changing security requirements will require changes to the
network topology and the manner in which the existing network topology interacts with
the cloud provider's network. In public cloud the data moves to or from the organization
which needs to ensure confidentiality and integrity. So if user is not using HTTPS but
using HTTP for accesses then it increase the risk which needs to be point out.
In hybrid cloud private and public cloud work together in different environments and
has different network topologies. So challenge here is to look risks associated with both
the topologies.
There are four significant factors needs to be considered in network level security.
 The Confidentiality and Integrity needs to be ensured for data-in-transit to and
from public cloud.
 The Access control including authentication, authorization, and auditing need to be
provided for resources you are using from public cloud.
 The Availability must be ensured for resources in a public cloud those are being
used by your organization or assigned to you by your public cloud providers.
 The Established Model of Network Zones and Tiers should be replaced with
domains.
The above factors are explained in detail as follows.
a) Ensuring Data Confidentiality and Integrity

In cloud some resources need to be accessed from public data center while some
resides in private data center so resources and data confined to a private network will be
exposed to the Internet and shared over public network which belongs to third-party
®
cloud provider.so there we need to ensure ensuring data confidentiality and integrity
together.
For example, as per Amazon Web Services (AWS) security vulnerability report the
users have used digital signature algorithms to access Amazon SimpleDB and Amazon
Elastic Compute Cloud (EC2) over HTTP instead of HTTPS. Because of that they did face
an increased risk that their data could have been altered in transit without their
knowledge.
b) Ensuring Proper Access Control

As some resources from private network are being exposed to public network, an
organization using a public cloud faces a significant increase in risk to its data. So there is
a need of auditing the operations of your cloud provider’s network by observing
network-level logs and data, and need thoroughly conducted investigations on it. The
reused IP address and DNS attacks are examples of this risk factor
c) Ensuring the Availability of Internet-Facing Resources

The network level security is needed because of increased amount of data and services
on externally hosted devices to ensure the availability of cloud-provided resources. The
BGP+ prefix hijacking is the good example of this risk factor. The Prefix hijacking
involves announcing an autonomous system address space that belongs to someone else
without her/his permission that affects the availability.
d) Replacing the existing model of network zones with Domains

In cloud, the already existing and established models of network zones and tiers are no
longer exists. In those models the network security has relied on zones like intranet and
internet. Those models were based on exclusion where only individuals and systems in
specific roles have access to those specific zones. For example, systems within a
presentation tier are not allowed to communicate directly with systems in the database
tier, but can communicate within the application zone.
4.4.3 Host Level Security

The Host Level security is related to cloud service models like SaaS, PaaS, and IaaS
and deployment models public, private, and hybrid. Although there are some known
threats to hosts but some virtualization threats like VM escape, system configuration,
drift, weak access control to the hypervisor and insider threats needs to be prevented.
Therefore, managing vulnerabilities and doing patches is becomes much harder than just
running a scan. The host level security for SaaS, PaaS and IaaS are explained as follows.
®
a) SaaS and PaaS Host Security

In general, CSP never disclose the information related to host platform and operating
systems that are in place to secure the host. So, in the context of SaaS and PaaS CSP is
liable to secure the hosts. To get assurance from CSP, the user can ask to share
information under a Non-Disclosure Agreement (NDA) by which CSP share their
information via a control’s assessment framework such as SysTrust or ISO 27002. As we
have seen that both PaaS and SaaS platforms hide the host operating system from end
users using host abstraction layer. Therefore, host security responsibilities in SaaS and
PaaS services are transferred to the CSP so you do not have to worry about host level
security by protecting hosts from host-based security threats.
b) IaaS Host Security

In IaaS hypervisor is the main controller of all the VMs underlined it. So, IaaS host
security involves securing the Virtualization software (called Hypervisor) and guest OS
or virtual servers. The Virtualization software is sits on top of bare metal hypervisor that
allows customers to create, destroy and manage virtual instances.so it is important to
protect Virtualization software as it sits between Hardware and Virtual servers. The
Virtual instances of operating systems like Windows, Linux are provisioned on the top of
the virtualization layers which are visible to their customers.so VM instances running
business critical applications are also needs to be protected. So, if hypervisor becomes
vulnerable then it could expose all users VM instances and domains to outsider.
The most common attacks happen at host level in public cloud are
 Hijacking of accounts those are not properly secured.
 Stealing the keys like SSH private keys those are used to access and manage hosts.
 Attacking unpatched and vulnerable services by listening on standard ports like
FTP, NetBIOS, SSH.
 Attacking systems that are not secured by host firewalls.
 Deploying Trojans embedded viruses in the software’s running inside the VM.
So, the recommendation for host level security are given as follows
 Do not allow password-based authentication for shell-based user access.
 Install and Configure a host firewall with minimum ports should be opened for
necessary services.
 Install host-based IDS and IPS.
 Always enable the system auditing and event logging.
 Use public and private keys to access hosts in the public cloud.
®
 Periodically review logs for inspecting the suspicious activities.

 Protect the integrity of vm images from unauthorized access.
 Always ask for super user password or role-based access for Unix based host
images.
4.4.4 Application Level Security

The application security describes the security features and specifications for the
applications and discusses the outcomes of the safety testing. Security of applications in
the cloud is one of vital success factor for every SaaS or PaaS cloud. Security procedures
for software, secure declaration of instructions, training scripts and testing methods are
usually a joint effort for the development and security teams. The product development
team should provide the security standards of product development engineers, although
product engineering is likely to focus on the application layer. They must provide the
security design and the infrastructure levels that interact with the application itself.
The security team and product development team must work together to provide
better application level security. For source code reviews of applications and to get
insights about attacks, the external penetration testers are used who ensure to fulfill an
objective of safety review of the application and for customers they regularly perform the
attacks and penetration tests. Firewalls, intrusion detection and prevention system,
integrity monitoring and log inspection can all be used as virtual machine application to
enhance server and application protection along with compliance integrity as virtual
resources are migrate from on-site to public cloud environments.
The Application security is happened at the SaaS or PaaS level of SPI (SaaS, PaaS and
IaaS) model therefore cloud service providers are responsible for providing security for
applications hosted in their data centers. At SaaS level the hosted applications need to be
protected while at PaaS level the Platform, databases and runtime engines needs to be
protected. The designing and implementing new applications which are going to be
deployment on public cloud platform will require reevaluation of existing application
security programs and standards. Mostly the web applications like content management
system, websites, blogs, portals, bulletin boards and discussion forums are used by small
and large organizations which are hosted on cloud platform. Therefor attacks related to
web needs to be prevented by understanding the vulnerabilities in the websites. The
cross-site scripting (XSS) attack, SQL injection, malicious file execution are the common
attacks happened at application level in the cloud.
®
The common types attacks happened at network, host and application levels are
explained in Table 4.1.1.
Level of Attack Name of Attack Description
In which attacker monitors the network traffic in transit

Eavesdropping
then interprets all unprotected data.
In this attack the valid data is transmitted maliciously
Replay attack and repeatedly to gain access to unauthorized
resources.
In this attack same IP address is reassigned to new
Reused IP
customer which is used by other customer abruptly that
address
violating the privacy of the original user.
In this attack the attacker translates the domain name in
DNS Attacks to IP address such that sender and a receiver get
rerouted through some evil connection.
In this attack the wrong announcement on IP address
BGP Prefix associated with Autonomous System (AS) is done such
Hijacking that malicious parties get access to the untraceable IP
Network Level
address.
Attacks
In this attack the data flowing on network is traced and
captured by Sniffer program and recorded through the
Sniffer Attack
NIC such that the data and traffic is rerouted to evil
connection.
In this attack the open ports are scanned which are
configured by customer to allow traffic from any source
Port Scanning
to a specific port, then that specific port will become
vulnerable to a port scan.
This type of attack prevents the authorized users to
Dos Attack access services on network by flooding, disrupting,
jamming, or crashing them.
Distributed In DDoS attack that occurs from more than one source,
Denial of Service and from more than one location at the same time to
Attack flood the server.
As hypervisor is responsible for running monitor
Host level Threats to multiple Guest operating systems under single
attacks hypervisor hardware unit. It is difficult to monitor it so malicious
code get control of the system and block other guest OS.
®
Self-provisioning feature provided by virtual servers on

Threats to Virtual
an IaaS platform creates a risk that insecure virtual
servers
servers
Attackers insert a malicious code into a standard SQL
SQL Injection
code and allow downloading the entire database in
attack
illicit ways.
It embed the script tags in URLs such that when user
Cross-site clicks on them, the JavaScript get executed on
scripting [XSS] machine and hacker get control and access all our
private information
EDoS attacks on pay-as-you-go cloud applications will
result dramatic increase in your cloud utility bill,
increased use of network bandwidth, CPU, and storage
EDoS
consumption. This type of attack is done against the
billing model that underlies the cost of providing a
service.
Application level
In this type of attack the unauthorized person changes
attacks
or modifies the content of cookies to identity the
Cookie Poisoning
credential information and accesses the applications or
web pages.
This type of attack happens when developers enable the
Backdoor and debugging option during publishing the web site. So,
debug options hacker can easily enter into the web-site and make some
changes.
In this attack the attacker identifies hidden fields in
Hidden field
HTML forms, save the catalogue page and change the
manipulation
value of hidden fields posted on web page.
This type of attack is same as eavesdropping where
Man in the
tracker set up the connection between two users and
middle attack
tries to hear the conversation between them.
Table 4.4.1 Common types of attacks happened at network, host and application levels
4.5 Cloud Security Challenges

Although cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is important to
resolve increased security threats in order to fully benefit from this new computing
paradigm. This applies to SaaS suppliers in particular. You share computing services with
®
other companies in a cloud environment. In such environment, you may not have
awareness or control of where the resources are running in a shared pool outside the
organization’s boundary. Sharing your data in such environment with other companies
may give the government reasonable reason to seize your assets as other company has
violated the law of compliance. You may put your data at risk of seizure because you
have shared the cloud environment. Most of the times, if you want to switch from one
cloud provider to the other, storage services offered by one cloud vendor may be
incompatible with other platform services. Like Amazon’s “Simple Storage Service” [S3]
is incompatible with IBM’s Blue Cloud, or Dell or Google cloud platform. In Storage
cloud, most clients probably want their data to be encrypted via SSL (Secure Sockets
Layer) across the Internet in both ways. They most probably want to encrypt their data
while it is in the cloud storage pool. Therefore, in cloud, who is controlling the encryption
/ decryption keys when information is encrypted during the cloud ? Is it by the client or
the vendor in the cloud? These are unanswered questions. Therefore, before moving
data to the cloud make sure that the encryption / decryption keys are working and
tested, as when data resides on your own servers.
The integrity of data means making certain that data is maintained identically during
each operation (e.g. transmission, storage or recovery). In other words, data integrity
ensures consistence and correctness of the data. Ensuring the integrity of the information
does mean that it only changes when the transactions are authorized. It sounds good, but
you must remember that there is still no common standard for ensuring data integrity.
The use of SaaS services in the cloud means that software development is much less
necessary. If you plan on using internally developed cloud coding, a formally secure
development software life cycle (SDLC) is even more important. Inadequate use of
mashup technology (web services combinations), which is crucial for cloud applications,
will certainly lead to unknown security vulnerabilities in such applications. A security
model should be integrated into the development tool to guide developers during the
development phase and restrict users to their authorized data only once the system has
been deployed. With increasing number of mission-critical processes moving into the
cloud, SaaS providers will need to provide log information directly and in real time,
probably for their administrators and their customers alike. Someone must take
responsibility of monitoring for security and compliance control. They would not be able
to comply without application and data being tracked by the end users.
As the Payment Card Industry Data Protection Standard (PCI DSS) includes access to
logs, the auditors and regulators may refer to them for auditing a security report. The
security managers must ensure that they obtain access to the logs of the service provider
®
in the context of any service agreements. Cloud apps are constantly being enhanced by
features and users must remain up-to - date about app improvements to make sure they
are protected. SDLC and security are affected by the speed at which cloud changes in
applications. For example, the SDLC of Microsoft assumes a three to five-year period for
which the mission-critical software won't change substantially, but the cloud may require
a change in the application every couple of weeks. Unfortunately, a secure SLDC cannot
deliver a security cycle that keeps pace with such rapid changes. This means that users
have to update continuously as an older version does not work, or protect the data.
The appropriate fail-over technology is an often-overlooked aspect of securing the
cloud. The company cannot survive if a mission critical application goes offline but may
be survive for non-mission critical applications. Security must shift to device level, so that
businesses can ensure that their data is secured everywhere they go. In cloud computing,
security at the data level is one of the major challenges.
In a cloud world, the majority of compliance requirements do not allow for
enforcement. There are a wide range of IT security and compliance standards that
regulates most business interactions that must be converted to the cloud over time. SaaS
makes it much more difficult for a customer to determine where his data resides in a
network managed by its SaaS provider or a partner of that provider, posing all kinds of
data protection, aggregation and security enforcement concerns. Many regulations of
compliance require that data not be mixed with other data on shared servers or on
databases. Some nation’s government has strict restrictions on what and how long their
citizens can store data. Some regulations on banking require that the financial data from
customers must stay in their countries. Many mobile IT users can have access to business
data & infrastructure without going through the corporate network through cloud-based
applications. It will increase the need for businesses to monitor security between mobile
and cloud-based users. Placing large amounts of confidential information in a global
cloud enables companies to face wide-ranging distributive threats - attacker no longer
have to come and steal data as all this can be found in one "virtual" location. Cloud
virtualization efficiencies require that multi-organisations virtual machines situated
together on the same physical resources. Although the security of traditional data center
remains in place in the cloud environment, the physical separation and hardware security
of virtual machines on the same server cannot protect them from attack. Management
access is via the Internet instead of direct or on-site, monitored and restricted connections
that are in line with the conventional data center model. It raises risk and visibility and
demands strict monitoring for system control modifications and restrictions on access
control.
®
The complex and flexible design of the virtual machines would make it hard for the
security to be maintained and auditable. It would be difficult to demonstrate the security
status of a device and detect the location of an unsafe virtual machine. No matter where a
virtual machine is located in virtual environment, intrusion detection and prevention
systems will require malicious activity to be detected on the virtual machine. The
interconnection of several virtual machines increases the attack surface and risk of
compromise between machines. Individual virtual machines and physical servers in the
cloud server environment uses same operating systems along with business and web
applications that makes raising the threat of an attack or malware exploiting
vulnerabilities remotely. Due to the switching between private cloud and public cloud,
virtual machines become vulnerable. A cloud system that is completely or partially
shared would have a greater attacking surface and thus be more at risk than a resource
environment. Operating systems and application files in a virtualized cloud environment
are on a shared physical infrastructure that requires system, file and activity control to
provide confidence and auditable proof to corporate clients with that their resources have
not become compromised or manipulated.
In the cloud computing environment, the organization uses cloud computing
resources where the subscriber is responsible for patching not the cloud computing
provider. Therefore, it is essential to have patch maintenance awareness. Companies are
frequently required to prove that their conformity with security regulations, standards
and auditing practices is consistent, irrespective of the location of the systems on which
data resides. The data is flexible in the cloud environment and can be placed in on-
premises physical servers, on-site virtual machines or outside the premises on virtual
cloud computing services, and auditors and practicing managers may have to reconsider
it. Many companies are likely to rush into cloud computing without serious consideration
of security implications in their efforts to profit from the benefits of cloud computing,
including significant cost savings.
The virtual machines need to protect themselves, essentially moving the perimeter to
the virtual machine itself, in order to create areas of cloud trust. Enterprise perimeter
security is provided through firewalls, segmentation of network, IDS/IPS, monitoring
tools, De-Militarized Zones (DMZs) and security policies associated with them. These
security strategies and policies control the data resides or transits behind the perimeter.
The cloud service provider is responsible for the security and privacy of customer’s data
in the cloud computing environment.
®
4.5.1 Key Privacy Issues in the Cloud

The privacy is deals with collection, use, disclosure, retention and disclosure of
Personally Identifiable Information (PII) or data. According to American Institute of
Certified Public Accountants (AICPA) the definition of privacy is given as,
“Privacy is nothing but the right and obligations of individuals and organizations with respect
to collection, retention and disclosure of personal information”.
Although privacy is important aspect of security but most of the time it is ignored by
the users. The privacy has many concerns related to data collection, use, retention and
storage in cloud explained as follows
a) Compliance issue
The compliance is related to regulatory standards provided for the use of personal
information or data privacy by the country’s laws or legislations. The compliance makes a
restriction on use or share of personally identifiable information by cloud service
providers. The various regulatory standards are available in USA for data privacy like
USA patriot act, HIPAA, GLBA, FISMA etc.
The compliance concern is depends on various factors like applicable laws,
regulations, standards, contractual commitment, privacy requirements etc. for example as
cloud has multitenant environment, the users data is stored across multiple countries,
regions or states where each region, country has their own legislations related to use and
sharing of personal data that makes restriction on usage of such data.
b) Storage issue
In cloud, storage is the biggest issue because as cloud has multitenant environment it
makes multiple copies of user’s data and store them in multiple data centers across
multiple countries. Therefore, user never comes to know where their personal data is
stored and in which country. The storage concern is related to where users’ data is stored.
So, the main concern for user or organization is to find where their data is stored ? Was
it transferred to another datacenter in other country ? What are the privacy standards
enforced by those countries that makes a limitation on transferring personal data ?
c) Retention issue
The retention issue is related to duration for which the personal data is kept in storage
with retention policies. Each Cloud Service Provider (CSP) has their own set of retention
policies that governs the data.so user or organization has to look at retention policy used
by CSP with their exceptions.
®
d) Access issue
The access issue is related to organizations ability to provide individual with access to
personal information and to comply stated request. The user or organization have right to
know what personal data is kept in cloud and can make request to CSP to stop processing
it or delete it from the cloud.
e) Auditing and monitoring

The organization has right to know the audit policies have been implemented by CSP.
They can monitor the activities by CSP and assure their stakeholders about privacy
requirements met with PII in cloud.
f) Destruction of data
At the end of retention period CSPs used to destroy PII. So the concern here is
organizations never comes to know whether their data or PII on cloud is destroyed by
CSP or not or they have kept additional copies or they just make it inaccessible to
organization.
g) Privacy and security breaches

The users in cloud never comes to know whether security breaches are occurred or
not.so the negligence of CSP may put privacy breaches and has to be resolved by CSPs to
avoid inaccessibility.
4.6 Software-as-a-Service Security

Cloud future models will likely incorporate the use of the Internet for fulfilling their
customers’ requirements via the SaaS and other XaaS models with Web 2.0 collaboration
technologies. The move to cloud computing not only leads to the creation of new business
models, but also creates new security problems and requirements as previously
mentioned. The evolutionary step in the cloud service models are shown in Fig. 4.6.1.
For the near future, SaaS is likely to remain the dominant model in cloud services. In
this field, the security practices and monitoring are most importantly required. Like a
managed service provider, businesses or end user must look at the data protection
policies of vendors before using vendor services to prevent them from losing or unable to
access their data.
®
Fig. 4.6.1 Evolutionary steps in the cloud service models
The survey firm Gartner lists seven security problems, which should be discussed with
a cloud computing provider.
 Data location : Is it possible for the provider to check data location ?
 Data segregation : Ensure that encryption is effective at all times and such this
encryption schemes are designed and tested by qualified experts.
 Recovery : In the event of a disaster, find out what will happen with data. Are
service providers are offering full restoration ? If so, how much time does it take ?
 Privileged user access : Find out who has sophisticated access to data and how such
administers are hired and managed.
 Regulatory compliance : Ensure the vendor is ready to be audited externally
and/or certified for security.
 Long-term viability : What if the company goes out of business, and what will
happen with the data ? How and in what format will the data be restored ?
 Investigative support : Is the vendor able to investigate any inappropriate or illegal
activity ?
It is now more difficult to assess data protection, meaning that data security roles are
more critical than in past years. The data must be encrypted yourself as a tactic besides
the Gartner's report. If you encrypt the data using a trustworthy algorithm, then the data
will only be available with the decryption key, regardless of the security and encryption
policies of the service provider. This leads to a further problem, of course: How can you
manage private keys in a computing infrastructure with pay on demand ? SaaS suppliers
will have to incorporate and enhance security practices that managed service providers
®
provide and develop new practices as the cloud environment evolves in order to deal
with the above security issues along with those mentioned earlier. A structured
agreement for security organizations and the initiative is one of the most critical activities
for a security team. This will foster a shared view of what the security leadership is and
aims to achieve which will encourage 'ownership' for the group's success.
4.7 Security Governance

A security management committee should be set up with the goal of providing
guidance on security measures and coordination with business and IT strategies. So, one
of the outcomes for the steering committee is usually a safety charter. The charter must
clearly define the roles and responsibilities of the security team and the other groups
involved performing information security exercises. The lack of a formalized strategy can
lead to the development of an unsupportable operating model and the level of security.
Furthermore, lack of care for security management may lead to failure to satisfy key
business needs, including risk management, security surveillance and application
security as well as support for sales. The inability to correctly govern and manage tasks
may also result in the un addressing of potential safety risks and missing opportunities
for improving businesses because security teams are not focused on key business security
functions and activities.
The essential factors required in security governance are explained as follows.
1. Risk Assessment
The risk assessment of security is crucial for helping the information security
organization to make informed decisions to balance business utility dueling security goals
and asset protection. Failure to carry out formal risk evaluations can contribute to
increase in information. Security Audit observations can compromise certification goals,
leading to ineffective, inefficient collection of security checks that cannot mitigate security
risks adequately to an appropriate level of information. The structured risk management
process for information security will proactively identify, prepare and manage risks for
security issues on a daily or on a required basis. Applications and infrastructure will also
provide further and more comprehensive technical risk assessments in the context of
threat-modeling. This can assist product management and engineering groups to be more
proactive with the design, test and collaboration with the internal security team. The
modeling of threats requires both IT and business processes and technical knowledge
about the workings of the applications or systems under review.
®
2. Risk Management
The identification of technological assets; identification of data with its connections to
processes, applications, and storage of data; and assignment of ownership with custodial
responsibilities are part of effective risk management. The risk management measures
will also involve maintaining an information asset repository. Owners have the
responsibility and privileges to ensure the confidentiality, integrity, availability and
privacy of information assets, including protective requirements. A formal risk
assessment process for allocating security resources related to business continuity must
be developed.
3. Third-Party Risk Management

As SaaS progresses into cloud computing for the storage and processing of customer
data, it is also likely to handle security threats effectively with third parties. Approaching
a third-party risk management framework may lead to harm the reputation of the
provider, revenue losses and legal proceedings if it is found that the supplier has not
carried out due diligence on its third-party vendors.
4. Security Awareness
Security awareness and culture are among the few successful methods for handling
human risks in security. The failure to provide people with adequate knowledge and
training may expose the organization to a number of security risks which threats and
entry points for persons instead of systems or application vulnerabilities. The risk caused
by the lack of an effective security awareness program can leads to Social Engineering
attacks, lower reputation, slumping responses to potential security incidents, and the
inadvertent customer data leakage. The unique approach to security awareness is not
necessarily the right approach for all SaaS organizations; an information security
awareness and training program that adapts the information and training to the person's
role in the organisations, is more important. For example, development engineers can
receive security awareness in the form of a Secure Code and Testing Training while data
privacy and security certification training can be provided to customer service
representatives. An ideal approach should be used for both generic and personal
purposes.
5. Security Portfolio Management

In order to ensure efficient and successful operation of any information security
system, the security portfolio management is important in terms of the speedy access and
the interactive nature of cloud computing. The lack of portfolios and the discipline of
®
project management that result in a project not being completed and never realizing its
expected returns. There are excessive and unrealistic workloads expectations occur
because projects are not prioritized in accordance with policy, goals and resort ability.
The security team should ensure that the project plan and project manager with
appropriate training with experience are in place for each new project being conducted by
a security teams so that the project can be seen through to completion. The development
of methodology, tools and processes that support the expected complexity of projects for
both traditional business practice and cloud-based approach can be enhanced by portfolio
and project management capabilities.
6. Security Standards, Guidelines and Policies

In developing information security policies, standards and guidelines, many resources
and templates are available. Firstly, a cloud computing security team will identify
security details and business needs that are specific to cloud computing, SaaS and
collaborative applications. A cloud computing security team should identify information
security requirements and develop policies, supporting standards and guidelines which
should be documented and implemented. These policies, norms and guidelines should be
regularly (at least annually) reviewed in order to maintain relevance or where there are
significant changes in the business or the IT environment. Intangible security standards,
guidelines and policies may lead to the misrepresentation and inadvertent disclosure of
information as a business model for cloud computing alters frequently. In order to change
business initiatives, the business environment and the risk landscape, it is important to
maintain the accuracy and relevance of information security standards, guidelines and
policies. Such, standards, guidelines and policies also constitute the basis for maintaining
a consistent performance and the continuity of expertise during resource turnover.
7. Trainings and education

The security team may not be prepared to deal with the goal of the business without
appropriate training and mentorship programs. Programmers will also be developed to
provide the security team and its internal partners with basic security and risk
management skills and knowledge. It involves a formal process for assessing and aligning
skills to the security team and provides appropriate training and mentorship to include a
broad base of fundamental security, including data protection and risk management
knowledge. The security challenges facing an organization will have to change as per
cloud computing business model and associated services changes.
®
8. Security Monitoring and Incident Response

In order to notify security vulnerabilities and monitor systems on a long-term basis via
automated technologies, the centralized security information management systems
should be used. The centralized security information management systems should be
integrated to the network and other systems monitoring processes so as to perform
dedicated monitoring processes, including security information management, security
event management and managing security operation centers. Management of regular,
independent security testing by third parties should also be incorporated. Many of the
security threats and issues in SaaS across the application and data layers need different
approaches to security management from the conventional infrastructure and perimeter
controls, because the nature and severity of threats or attacks for SaaS organisations
changes dynamically. The company may therefore need to expand its security
surveillance capability to include application and data activity. This may also require
experts in application security and the unique aspects of cloud privacy. A company can't
detect and prevent threats to security or attacks against its customer data and service
stability without that capacity and expertise.
9. Requests for Information security during Sales Support

The security of the cloud computing customer is a top priority and a major concern, as
the absence of information security representatives who can help the sales team to
address the customer's problems could potentially lead to the loss of a sales opportunity.
The requests for information and sales support are part of the organizations SaaS security
teams who ensures integrity of security business model of the provider, compliance with
regulation and certification, reputation of the company, competitiveness, and
marketability. The sales support teams are relying on the ability of the security team to
provide a truthful, clear and concise response to a need of customers through Request For
Information (RFI) or Request For Proposal (RFP). A structured process and knowledge
base of information that is requested frequently will provide a significant amount of
efficiency and prevent the customer's RFI/RFP process from being supported on an
ad-hoc basis.
10. Business Continuity Plan and Disaster Recovery

The goal of Business Continuity plan (BC) and Disaster Recovery (DR) planning is to
reduce the effect of a negative outcome on business processes to a reasonable level.
Business continuity and resilience services ensure continuous operations in all layers of
the business and help them to avoid, prepare and recover from an interruption. SaaS
services that facilitate uninterrupted communications cannot just help the company to
®
recover from failure, but can also reduce the overall complexity, costs and risk of
managing your most critical applications on regular basis. There are also drastic prospects
in the cloud for cost-effective BC/DR solutions.
11. Vulnerability Assessment

Vulnerability assessment categorizes network assets in order to prioritize vulnerability
management programs more efficiently, including patching and upgrading. It measures
risk reduction by setting targets for reduced exposure to vulnerability and for faster
mitigation. The vulnerability management should be incorporated in business with
investigations, patch management and upgrade processes before they get exploited.
12. Data Privacy

In order to maintain data privacy, a risk assessment and gap analysis of controls and
procedures must be carried out. Either an individual or team must be assigned and be
held responsible for maintaining privacy according to the size and scale of the
organization. A member of a privacy or company security team should work with the
legal team to address privacy issues and concerns. A security team is responsible for
privacy concerns. As with defense, there should also be a privacy management committee
to assist in making decisions on data protection. A professional consultant or qualified
staff member will ensure that the company is able to fulfill its customers and regulators'
data protection needs. Relevant skills, training and expertise that are normally not in the
security team are required to mitigate privacy concerns.
13. Computer Forensics

The computer forensics are used for data gathering and analysis. It involves the
unfortunate incident by collecting and preserving information, the analysis of data for the
reconstruction of events and the assessment of the event status. Network Forensics
involves recording and analysis of network events that determine the nature and source
of information violence and protection, security attacks and other such incidents. This is
usually achieved through the long-term recording or capturing of packages from a key
area or points within your Infrastructure and then data mining for content analysis and
re-creation.
14. Password Testing

In cloud computing, distributed password crackers can be used by SaaS security team
or its customers to periodically test the password strength.
®
15. Security Images

Cloud computing based on virtualization allows to create secure builds of "Gold VM
Images" that provides up-to - date protection and exposure reduction by offline patches.
The offline VMs can be patched off-network for making the effects of security changes
easier, more cost-effective and more productive to test. This is a great way of duplicating
VM images to your production environment, implement a security change, testing the
impact at a low cost, minimizing start-up time, and removing major obstacles to security
in a production environment.
16. Compliance and Security Investigation logs

You can make use of cloud computing to generate logs are in the cloud that index
these logs in real time and take advantage of instant search results. A real time view can
be obtained because the measurement instances can be evaluated and scale based on the
logging load as required. Cloud computing also provides the option for enhanced
logging.
17. Secure Software Development Life Cycle

The Secure Software Development Life Cycle defines specific threats and the risks that
they present, develops and executes appropriate controls to combat threats and assists the
organization and/or its clients in the management of the risks they pose. It aims to ensure
consistency, repeatability and conformity.
Fig. 4.7.1 Secure Software Development Life Cycle
The Secure Software Development Life Cycle consists of six phases which are shown
in Fig. 4.7.1 and described as follows.
®
I. Initial Investigation : To define and document project processes and goals in the
security policy of the program.
II. Requirement Analysis : To analyze recent security policies and systems,
assessment of emerging threats and controls, study of legal issues, and perform
risk analysis.
III. Logical design : To develop a security plan; planning for incident response
measures; business responses to disaster; and determine whether the project can
be carried on and/or outsourced.
IV. Physical design : Selecting technologies to support the security plan, developing a
solution definition that is successful, designing physical security measures to
support technological solutions and reviewing and approving plans.
V. Implementation : Purchase or create solutions for security. Submit a tested
management package for approval at the end of this stage.
VI. Maintenance : The application code can be monitored, tested, and maintained
continuously for efficient enhancement. Further additional security processes
have been developed in order to support the development of application projects
such as external and internal penetration testing and standard security
requirements for data classification. Formally training and communication should
also be introduced to raise awareness about process improvement.
18. Security architectural framework

A Security architectural framework must be developed for implementing
authentication, authorization, access control, confidentiality, integrity, non-repudiation
and security management etc. across every application in the enterprises. It is also used
for evaluating processes, operating procedures, technology specifications, individuals and
organizational management, compliance to security programs and reporting. To align
with that, there should be a security architecture document that describes the concepts of
security and privacy in order to achieve business goals. The documenting is necessary for
assessing Risk management plans, asset-specific metrics, physical security, system access
control, network and computer management, application development, maintenance,
business continuity and compliance. The major goals of security architectural framework
are Authentication, Authorization, Availability, Confidentiality, Integrity, Accountability
and Privacy. To achieve business objectives, a security architecture document should be
established that outline the security and privacy principles. Concerning this architecture,
new design reviews can be better evaluated to ensure that they conform to the principles
described in the architecture to allow more coherent and effective design reviews.
®
4.8 Virtual Machine Security

In traditional network, several security attacks arise such as buffer overflows, DoS
attacks, spyware, malware, rootkits, Trojan horses and worms. Newer attacks may arise
in a cloud environment such as hypervisor malware, guest hopping, hijacking, or VM
root kits. The man-in-the-middle attack for VM migrations is another type of attack
happen on Virtual machines. The Passive attacks on VMs usually steal sensitive
information or passwords while active attacks manipulate the kernel data structures
which cause significant damage to cloud servers. To overcome the security attacks on
VMs, Network level IDS or Hardware level IDS can be used for protection, shepherding
programs can be applied for code execution control and verification and additional
security technologies can be used. The additional security technologies involve the use of
RIO's vSafe and vShield software, hypervisor enforcement and Intel VPro technologies,
with dynamic optimization infrastructure or using the hardened OS environment or are
using isolated sandboxing and execution. Physical servers are consolidated on virtualized
servers with several virtual machine instances in the cloud environment. Firewalls,
intrusion detection and prevention, integrity monitoring and log inspection can all be
deployed as software on virtual machines to enhance the integrity of servers, increase
protection and maintain compliance. Here applications to move from on-site to public
cloud environments as a virtual resource. The security software loaded on a virtual
machine should be filled with two-way stateful Firewall which enables virtual machine
isolation and localization, enabling tighter policy and the flexibility to transfer the virtual
machine from the on premises to cloud resources to make it easier for the centralized
management of the server firewall policy. The integrity monitoring and log inspection
should be used for virtual machine level applications. This approach to VM, which
connects the system back to the home server, that has some benefits in that the security
software can be incorporated into single software agent which ensures consistent cloud-
wide control and management while reintegrates itself seamlessly into existing
investments in security infrastructure, providing economic scale, deployment and cost
savings.
4.9 IAM
The Identity and Access Management (IAM) are the vital function for every
organisations, and SaaS customers have a fundamental expectation that their data is given
the principle of the least privilege. The privilege principle says that only the minimum
access is required to perform an operation that should be granted access only for the
minimum amount of time required. Aspects of current models including trust principles,
®
privacy implications and operational aspects of authentication and authorization, are

challenged within the cloud environment in which services delivered on demand and can
continuously evolve. To meet these challenges, SaaS providers need to align their efforts
by testing new models and management processes for IAM to include end-to-end trust
and identity across the cloud and their enterprises. The balance between usability and
security will be seen as an additional issue. When a good balance is achieved, obstacles to
the successful completion of their support and maintenance activities will impact both
businesses and their groups.
As cloud composed of many services deployed on big infrastructure because of that it
requires multiple security mechanisms to protect it from failure.
The Identity and access management is the security framework composed of policy
and governance components used for creation, maintenance and termination of digital
identities with controlled access of shared resources. It composed of multiple processes,
components, services and standard practices. It focuses on two parts namely Identity
management and access management. The directory services are used in IAM for creating
a repository for identity management, authentication and access management. The IAM
provides many features like user management, authentication management,
authorization management, Credential and attribute management, Compliance
management, Monitoring and auditing etc. The lifecycle of identity management is
Fig. 4.9.1 Lifecycle of Identity Management
The IAM architecture is made up of several processes and activities (see Fig. 4.9.2). The
processes supported by IAM are given as follows.
®
a) User management - It provides processes for managing the identity of different

entities.
b) Authentication management - It provides activities for management of the
process for determining that an entity is who or what it claims to be.
c) Access management - It provides policies for access control in response to request
for resource by entity.
d) Data management - It provides activities for propagation of data for
authorization to resources using automated processes.
e) Authorization management - It provides activities for determining the rights
associated with entities and decide what resources an entity is permitted to access
in accordance with the organization’s policies.
f) Monitoring and auditing - Based on the defined policies, it provides monitoring,
auditing, and reporting compliance by users regarding access to resources.
The activities supported by IAM are given as follows.
a) Provisioning - The provisioning has essential processes that provide users with
necessary access to data and resources. It supports management of all user
account operations like add, modify, suspend, and delete users with password
management. By provisioning the users are given access to data, systems,
applications, and databases based on a unique user identity. The deprovisioning
does the reverse of provisioning which deactivate of delete the users identity with
privileges.
b) Credential and attribute management - The Credential and attribute management
prevents identity impersonation and inappropriate account use. It deals with
management of credentials and user attributes such as create, issue, manage and
revoke users to minimize the business risk associated with it. The individuals’
credentials are verified during the authentication process. The Credential and
attribute management processes include provisioning of static or dynamic
attributes that comply with a password standard, encryption management of
credentials and handling access policies for user attributes.
c) Compliance management - The Compliance management is the process used for
monitoring the access rights and privileges and tracked to ensure the security of
an enterprise’s resources. It also helpful to auditors to verify the compliance to
various access control policies, and standards. It includes practices like access
monitoring, periodic auditing, and reporting.
®
d) Identity federation management - Identity federation management is the process

of managing the trust relationships beyond the network boundaries where
organizations come together to exchange the information about their users and
entities.
e) Entitlement management - In IAM, entitlements are nothing but authorization
policies. The Entitlement management provides processes for provisioning and
deprovisioning of privileges needed for the users to access the resources
including systems, applications, and databases.
Fig.4.9.2 IAM Architecture
®
4.10 Security Standards

Security standards are needed to define the processes, measures and practices required
to implement the security program in a web or network environment. These standards
also apply to cloud-related IT exercises and include specific actions to ensure that a secure
environment is provided for cloud services along with privacy for confidential
information. Security standards are based on a set of key principles designed to protect a
trusted environment of this kind. The following sections explain the different security
standards used in protecting cloud environment.
4.10.1 Security Assertion Markup Language (SAML)

Security Assertion Markup Language (SAML) is a security standard developed by
OASIS Security Services Technical Committee that enables Single Sign-On technology
(SSO) by offering a way of authenticating a user once and then communicating
authentication to multiple applications. It is an open standard for exchanging
authentication and authorization data between parties, in particular, between an identity
provider and a service provider.
It enables Identity Providers (IdPs) to pass permissions and authorization credentials
to Service Providers (SP). A range of existing standards, including SOAP, HTTP, and
XML, are incorporated into SAML. An SAML transactions use Extensible Markup
Language (XML) for standardized communications between the identity provider and
service providers. SAML is the link between the authentication of a user’s identity and
the authorization to use a service. The majority of SAML transactions are in a
standardized XML form. The XML schema is mainly used to specify SAML assertions and
protocols. For authentication and message integrity, both SAML 1.1 and SAML 2.0 use
digital signatures based on the XML Signature Standard. XML encryption is supported in
SAML 2.0 but not by SAML 1.0 as it doesn’t support encryption capabilities. SAML
defines assertions, protocol, bindings and profiles based on XML.
A binding of SAML defines how SAML requests and responses map to the standard
messaging protocols. A SAML binding is a mapping of a SAML protocol message onto
standard messaging formats and/or communications protocols. SAML Core refers to the
general syntax and semantics of SAML assertions and to the protocol for requesting and
transmitting all those assertions from one system entity to another. SAML standardizes
user authentication, entitlements and attribute-related knowledge questions and answers
in an XML format. A platform or application that can transmit security data can be a
SAML authority, often called the asserting party. The assertion consumer or requesting
party is a partner site receiving security information. The information exchanged covers
®
an authentication status of a subject, access permission and attribute information. SAML

claims are usually passed to service providers from Identity Providers. Assertions include
claims used by service providers to make decisions about access control. A SAML
protocol describes how certain SAML elements (including assertions) are packaged
within SAML request and response elements, and gives the processing rules that SAML
entities must follow when producing or consuming these elements. For the most part, a
SAML protocol is a simple request-response protocol.
4.10.2 Open Authentication (OAuth)

OAuth is a standard protocol which allows secure API authorization for various types
of web applications in a simple, standard method. OAuth is an open standard for
delegating access and uses it as a way of allowing internet users to access their data on
websites and applications without passwords. It is a protocol that enables secure
authorization from web, mobile or desktop applications in a simple and standard way. It
is a publication and interaction method with protected information. It allows developers
access to their data while securing credentials of their accounts. OAuth enables users to
access their information which is shared by service providers and consumers without
sharing all their identities. This mechanism is used by companies such as Amazon,
Google, Facebook, Microsoft and Twitter to permit the users to share information about
their accounts with third party applications or websites. It specifies a process for resource
owners to authorize third-party access to their server resources without sharing their
credentials. Over secure Hypertext Transfer Protocol (HTTPs), OAuth essentially allows
access tokens to be issued to third-party clients by an authorization server, with the
approval of the resource owner. The third party then uses the access token to access the
protected resources hosted by the resource server.
4.10.3 Secure Sockets Layer and Transport Layer Security

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographically
secure protocols to provide security and data integrity for TCP/IP based
communications. The network connections segments in the transport layer are encrypted
by the TLS and SSL. In web browsers, e-mail, instant messaging and voice over IP, many
implementations of these protocols are widely used. TLS is the latest updated IETF
standard protocol for RFC 5246. The TLS protocol allows client/server applications to
communicate across a network in a way that avoids eavesdropping, exploitation,
tampering and message forgery. TLS uses cryptography to ensure endpoint
authentication and data confidentiality.
®
TLS authentication is a single way, since the client knows the identity of the server
already. The client is not authenticated in this case. This means that on browser level, the
browser specifically validated the server’s certificate and checked the digital signatures of
the server certificate issuing chain of Certification Authorities (CAs). No validation
identifies the end user's server. The end user must verify the identifying information
contained in the certificate of the server in order to be truly identifiable. That is the only
way for end users to be aware of the server's "identity," and it is the only way to securely
establish the identify, to check that the server's certificate specifies the URL, name or
address that they are using in the server's certificate. The valid certificate of another
website cannot be used by malicious websites, because they have no means of encrypting
the transmission in a way to decrypt it with a true certificate. Since only a trustworthy CA
can incorporate a URL in a certificate, this makes sure that it is appropriate to compare
the apparent URL with the URL specified in the certificate. A more secure bilateral
connection mode is also supported by TLS ensuring that both ends of the connection
communicate with the individual they believe is connected. This is called mutual
authentication. The TLS client side must also keep a certificate for mutual authentication.
Three basic phases involve TLS are Algorithm support for pair negotiation involves
cipher suites that are negotiated between the client and the server to determine the
ciphers being used; Authentication and key exchange involves decisions on
authentication algorithms and key exchange to be used. Here key exchange and
authentication algorithms are public key algorithms; and Message authentication using
Symmetric cipher encryption determines the message authentication codes. The
Cryptographic hash functions are used for message authentication codes. Once these
decisions are made, the transfer of data can be commenced.
Summary
 Resource management is a process for the allocation of computing, storage,

networking and subsequently energy resources to a set of applications, in a
context that aims to collectively meet the performance goals of infrastructure
providers, cloud users and applications.
 In Inter cloud resource provisioning, developers have to consider how to design
the system to meet critical requirements such as high throughput, HA, and fault
tolerance. The infrastructure for operating cloud computing services may be
either a physical server or a virtual server.
 Resource provisioning schemes are used for the rapid discovery of cloud
computing services and data in cloud.
®
 The provisioning of storage resources in cloud is often associated with the terms
like distributed file system, storage technologies and databases.
 There are three methods of resource provisioning namely Demand-Driven,
Event-Driven and Popularity-Driven.
 The cloud providers can expand or redimension their provision capacity in a
competitive and dynamic manner by leasing the computation and storage
resources of other cloud service providers with the use of Intercloud
architectural principles.
 Although cloud computing has many benefits in most of the aspects, but
security issues in cloud platforms led many companies to hesitate to migrate
their essential resources to the cloud.
 Even if cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is
important to resolve increased security threats in order to fully benefit from this
new computing paradigms.
 Some security issues in cloud platforms are trust, privacy, lack of security and
copyright protection.
 Key privacy issues in the cloud computing are Compliance issue, Storage
concern, Retention concern, Access Concern, Auditing and monitoring and so
on.
 The lack of a formalized strategy can lead to the development of an
unsupportable operating model and the level of security.
 The essential factors required in security governance are Risk Assessment and
management, Security Awareness, Security Portfolio Management, Security
Standards, Guidelines and Policies, Security Monitoring and Incident Response,
Business Continuity Plan and Disaster Recovery and so on.
 To overcome the security attacks on VMs, Network level IDS or Hardware level
IDS can be used for protection, shepherding programs can be applied for code
execution control and verification and additional security technologies can be
used.
 The Identity and access management is the security framework composed of
policy and governance components used for creation, maintenance and
termination of digital identities with controlled access of shared resources. It
composed of multiple processes, components, services and standard practices.
 Security standards are needed to define the processes, measures and practices
required to implement the security program in a web or network environment.
®
 Security Assertion Markup Language (SAML) is a security standard that enables

Single Sign-On technology (SSO) by offering a way of authenticating a user once
and then communicating authentication to multiple applications.
 OAuth is an open standard for delegating access and uses it as a way of allowing
internet users to access their data on websites and applications without
passwords.
 Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are
cryptographically secure protocols to provide security and data integrity for
TCP/IP based communications.
Q.1 List any four host security threats in public IaaS. AU : Dec.-17
Ans. : The most common host security threats in public IaaS public cloud are
 Hijacking of accounts those are not properly secured.
 Stealing the keys like SSH private keys those are used to access and manage hosts.
 Attacking unpatched and vulnerable services by listening on standard ports like FTP,
NetBIOS, SSH.
 Attacking systems that are not secured by host firewalls.
 Deploying Trojans embedded viruses in the software’s running inside the VM.
Q.2 Mention the importance of transport level security. AU : Dec.-16

Ans. : The TLS protocol allows client / server applications to communicate across a
network in a way that avoids eavesdropping, exploitation, tampering and message
forgery. TLS uses cryptography to ensure endpoint authentication and data
confidentiality. TLS authentication is a single way, since the client knows the identity of
the server already. This means that on browser level, the browser specifically validated
the server’s certificate and checked the digital signatures of the server certificate issuing
chain of Certification Authorities (CAs). No validation identifies the end user's server.
The end user must verify the identifying information contained in the certificate of the
server in order to be truly identifiable.
Q.3 Discuss on the application and use of identity and access management. AU : Dec.-16
OR What is identity and access management in a cloud environment ? AU : Dec.18

Ans. : The Identity and access management in cloud computing is the security
framework composed of policy and governance components used for creation,
maintenance and termination of digital identities with controlled access of shared
resources. It composed of multiple processes, components, services and standard
®
practices. It focuses on two parts namely Identity management and access

management. The directory services are used in IAM for creating a repository for
identity management, authentication and access management. The IAM provides many
features like user management, authentication management, authorization
management, credential and attribute management, compliance management,
monitoring and auditing etc.
Q.4 What are the various challenges in building the trust environment ? AU : May-17
Ans. : In cloud computing, Trust is important in building healthy relationships
between cloud service provider and cloud user. The trust environment between service
providers and cloud users can’t be easily build as the customer shows limited belief and
trust on particular cloud service provider due to growing number of cloud service
providers available on internet. The various challenges in building the trust
environment are
a) Lack of trust between service providers and cloud users can prevent cloud
computing from being generally accepted as a solution for on demand service.
b) It can generate Lack of transparency, difficulty in communication and
confidentiality between cloud service provider and cloud users.
c) Lack of Standardization.
d) Challenges due to multi-tenancy and audit trails.
Q.5 Differentiate between authentication and authorization. AU : Dec.-19
Ans. : Authentication is the process of validating individuals’ credentials like User

Name/User ID and password to verify your identity. Authentication technology
provides access control for systems by checking to see if a user's credentials match the
credentials in a database of authorized users or in a data authentication server. It
determines the user’s identity before revealing the sensitive information.
Authorization technique is used to determine the permissions that are granted to an
authenticated user. In simple words, it checks whether the user is permitted to access
the particular resources or not. Authorization occurs after authentication, where the
user’s identity is assured prior then the access list for the user is determined by looking
up the entries stored in the tables and databases. In authentication process, the identity
of users is checked for providing the access to the system while in authorization
process; person’s or user’s authorities are checked for accessing the resources. The
authentication is always done before the authorization process.
®
Q.6 List key privacy issues in cloud. AU : Dec.-19

Ans. : “Privacy is nothing but the right and obligations of individuals and
organizations with respect to collection, retention and disclosure of personal
information”. Although privacy is important aspect of security but most of the time it
is ignored by the users. The privacy has many concerns related to data collection, use,
retention and storage in cloud which are listed as follows.
a) Compliance issue b) Storage issue
c) Retention issue d) Access issue
e) Auditing and monitoring f) Destruction of data
g) Privacy and security breaches
Q.7 List out the security challenges in cloud. AU : May -19
Ans. : The security challenges in cloud are
a) A Lack of Visibility and Control
b) Compliance complexity issues
c) Trust and Data Privacy Issues
d) Data Breaches and Downtime
e) Issues related to User Access Control
f) Vendor Lock-In
g) Lack of Transparency
h) Insecure Interfaces and APIs
i) Insufficient Due Diligence
j) Shared Technology Vulnerabilities
k) Potential Threats like Distributed Denial of Service (DDos), Man in the Middle
attacks or Traffic Hijacking etc.
Q.8 How can the data security be enforced in cloud ? AU : May-19
Ans. : In Cloud computing data security can be enforced by
a) Providing data encryption for in transit data
b) Providing data privacy and privacy protection
c) Providing data availability with minimal downtime
d) Preserving data integrity
e) Maintaining confidentiality, integrity, and availability for data
f) Incorporating different access control schemes like Role Based Access Control
(RBAC), Mandatory Access Control or Discretionary Access Control.
g) Secure data from different threats
®
Q.9 What are three methods of resource provisioning ?

Q.10 What is the purpose of Open Authentication in the cloud computing ?
Ans. : OAuth is a standard protocol in cloud computing which allows secure API
authorization for various types of web applications in a simple, standard method.
OAuth is an open standard for delegating access and uses it as a way of allowing
internet users to access their data on websites and applications without passwords. It is
a protocol that enables secure authorization from web, mobile or desktop applications
in a simple and standard way. It is a publication and interaction method with protected
information. It allows developers access to their data while securing credentials of their
accounts. OAuth enables users to access their information which is shared by service
providers and consumers without sharing all their identities. This mechanism is used
by companies such as Amazon, Google, Facebook, Microsoft and Twitter to permit the
users to share information about their accounts with third party applications or
websites.
Q.1 “In today’s world, infrastructure security and data security are highly challenging at
network, host and application levels”, Justify and explain the several ways of protecting the
data at transit and at rest. AU : May-18
Ans. : Refer section 4.4.1 to 4.4.4.
Q.2 Explain the baseline Identity and access management (IAM) factors to be practices by
the stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment. AU : May-18
Ans. : Refer section 4.9 for Identity and access management and 4.5.1 for the common
key privacy issues likely to happen in the environment.
Q.3 What is the purpose of IAM ? Describe its functional architecture with an illustration.
AU : Dec.-17
Q.4 Write details about cloud security infrastructure. AU : Dec.-16
Q.5 Write detailed note on identity and access management architecture. AU : May-17
Q.6 Describe the IAM practices in SaaS, PaaS and IaaS availability in cloud. AU : Dec.-19
®
Q.7 How is the identity and access management established in cloud to counter threats ?
AU : May-19
Q.8 Write detailed note on Resource Provisioning and Resource Provisioning Methods
Q.9 How Security Governance can be achieved in cloud computing environment
Q.10 Explain different Security Standards used in cloud computing


®
5
Cloud Technologies and
Advancements
Syllabus
Hadoop – MapReduce – Virtual Box -- Google App Engine – Programming Environment for
Google App Engine - Open Stack – Federation in the Cloud – Four Levels of Federation –
Federated Services and Applications – Future of Federation.
Contents
5.1 Hadoop
5.2 Hadoop Distributed File system (HDFS)
5.3 Map Reduce
5.4 Virtual Box
5.5 Google App Engine
5.6 Programming Environment for Google App Engine
5.7 Open Stack
5.8 Federation in the Cloud
5.9 Four Levels of Federation
5.10 Federated Services and Applications
5.11 The Future of Federation
(5 - 1)
Cloud Computing 5-2 Cloud Technologies and Advancements
5.1 Hadoop
With the evolution of internet and related technologies, the high computational power,
large volumes of data storage and faster data processing becomes the basic need for most
of the organizations and it has been significantly increased over the period of time.
Currently, organizations are producing huge amount of data at faster rate. The recent
survey on data generation by various organization says that Facebook produces roughly
600+ TBs of data per day and analyzes 30+ Petabytes of user generated data, Boeing jet
airplane generates more than 10 TBs of data per flight including geo maps, special images
and other information, Walmart handles more than 1 million customer transactions every
hour, which is imported into databases estimated to contain more than 2.5 petabytes of
data.
So, there is a need to acquire, analyze, process, handle and store such a huge amount
of data called big data. The different challenges associated with such big data are given as
below
a) Volume : The Volume is related to Size of big data. The amount of data growing
day by day is very huge. According to IBM, in the year 2000, 8 lakh petabytes of
data were stored in the world.so challenge here is, how to deal with such huge Big
Data.
b) Variety : The Variety is related to different formats of big data. Nowadays most of
the data stored by organizations have no proper structure called unstructured data.
Such data has complex structure and cannot be represented using rows and
columns. The challenge here is how to store different formats of data in databases.
c) Velocity : The Velocity is related to speed of data generation which is very fast. It is
a rate at which data is captured, generated and shared. The challenge here is how to
react to massive information generated in the time required by the application.
d) Veracity : The Veracity refers to uncertainty of data. The data stored in database
sometimes is not accurate or consistent that makes poor data quality. The
inconsistent data requires lot of efforts to process such data.
The traditional database management techniques are incapable to satisfy above four
characteristics as well as doesn’t supports storing, processing, handling and analyzing big
data. Therefore, these challenges associated with Big data can be solved using one of the
most popular framework provided by Apache is called Hadoop.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming
®
models. It is designed to scale up from a single server to thousands of machines, with a

very high degree of fault tolerance. It is a software framework for running the
applications on clusters of commodity hardware with massive storage, enormous
processing power and supporting limitless concurrent tasks or jobs.
The Hadoop core is divided into two fundamental components called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that organizes
files and stores their data on a distributed computing system, while MapReduce is the
computation engine running on top of HDFS as its data storage manager. The working of
HDFS and MapReduce are explained followed by Hadoop Ecosystem Components in
further sections.
5.1.1 Hadoop Ecosystem Components

Although HDFS and MapReduce are two main components in Hadoop architecture
but there are several other components which are used for storing, analyzing and
processing a big data collectively termed as Hadoop ecosystem components. The different
components of Hadoop ecosystem are shown in Fig. 5.1.1 and explained in Table 5.1.1.
Fig. 5.1.1 : Hadoop ecosystem components
®
Sr. No. Name of Component Description

1) HDFS It is a Hadoop distributed file system which is used to split the
data in to blocks and stored amongst distributed servers for
processing. It runs multiple clusters to store several copies of
data blocks those can be used in case failure occurs.
2) MapReduce It’s a programming model to process the big data. It

comprising of two programs written in Java such as mapper
and reducer. The mapper extracts data from HDFS and put in
to maps while reducer aggregate the results generated by
mappers.
3) Zookeeper It is a centralized service used for maintaining configuration

information with distributed synchronization and
coordination.
4) HBase It is a Column-oriented database service used as NoSQL

solution for big data
5) Pig It is a platform used for analyzing the large data sets using a
high-level language. It uses dataflow language and provides
parallel execution framework.
6) Hive It provides data warehouse infrastructure for big data
7) Flume It provides distributed and reliable service for efficiently

collecting, aggregating, and moving large amounts of log data.
8) Scoop It is a tool designed for efficiently transferring bulk data

between Hadoop and structured data stores such as relational
databases.
9) Mahaout It provides libraries for scalable machine learning algorithms

implemented on the top of Hadoop implemented using
MapReduce framework.
10) Oozie It is a workflow scheduler system to manage the Hadoop jobs.
11) Ambari It provides a software framework for provisioning, managing

and monitoring Hadoop clusters.
Table 5.1.1 : Different components of Hadoop ecosystem
As we know that from all above components, HDFS and MapReduce are the two core
components of Hadoop framework which are explained in next sections.
®
5.2 Hadoop Distributed File system (HDFS)

The Hadoop Distributed File system (HDFS) is the Hadoop implementation of
distributed file system design that hold large amount of data. It provide easier access to
stored data to many clients distributed across the network. It is highly fault tolerant and
designed to be run - on low - cost hardware (called commodity hardware). The files in
HDFS are stored across the multiple machine in redundant fashion to recover the data
loss in case of failure.
It enables storage and management of large files stored on distributed storage medium
over the pool of data node. A single name node runs in a cluster is associated with
multiple data nodes that provide the management of hierarchical file organization and
namespace. The HDFS file composed of fixed size blocks or chunks that are stored on
data nodes. The name node is responsible for storing the metadata about each file that
includes attributes of files like type of file, size, date and time of creation, properties of the
files as well as the mapping of blocks to files at the data nodes. The data node treats each
data block as a separate file and propagates the critical information with the name node.
The HDFS provides fault tolerance through data replication that can be specified at the
time of file creation with attribute name degree of replication (i.e., the number of copies
made) which is progressively significant in bigger environments consisting of many racks
of data servers. The significant benefits provided by HDFS are given as follows
 It provides streaming access to file system data
 It is suitable for distributed storage and processing
 It is optimized to support high streaming read operations with limited set.
 It supports file operations like read, write, delete but append not update.
 It provides Java APIs and command line command line interfaces to interact with
HDFS.
 It provides different File permissions and authentications for files on HDFS.
 It provides continuous monitoring of name nodes and data nodes based on
continuous “heartbeat” communication between the data nodes to the name node.
 It provides Rebalancing of data nodes so as to equalize the load by migrating blocks
of data from one data node to another.
 It uses checksums and digital signatures to manage the integrity of data stored in a
file.
®
 It has built-in metadata replication so as to recover data during the failure or to

protect against corruption.
 It also provides synchronous snapshots to facilitates rolled back during failure.
5.2.1 Architecture of HDFS

The HDFS follows Master-slave architecture using name node and data nodes. The
name node act as a master while multiple data nodes worked as slaves. The HDFS is
implemented as block structure file system where files are broken in to block of fixed size
stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.2.1.
Fig. 5.2.1 : HDFS architecture
The Components of HDFS composed of following elements
1. Name Node
An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system, while
the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
®
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block creation,
deletion and replication upon instruction from name node. The data node stores each
HDFS data block in separate file and several blocks are stored on different data nodes.
The requirement of such a block structured file system is to store, manage and access files
metadata reliably.
The representation of name node and data node is shown in Fig. 5.2.2.
Fig. 5.2.2 : Representation of name node and data nodes
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user references
files and directories by paths in the namespace. The user application does not need to
aware that file system metadata and storage are on different servers, or that blocks have
multiple replicas. When an application reads a file, the HDFS client first asks the name
node for the list of data nodes that host replicas of the blocks of the file. The client
contacts a data node directly and requests the transfer of the desired block. When a client
writes, it first asks the name node to choose data nodes to host replicas of the first block of
the file. The client organizes a pipeline from node-to-node and sends the data. When the
first block is filled, the client requests new data nodes to be chosen to host replicas of the
next block. The Choice of data nodes for each block is likely to be different.
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system are
divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
The HDFS is fault tolerance such that if data node fails then current block write
operation on data node is re-replicated to some other node. The block size, number of
®
replicas and replication factors are specified in Hadoop configuration file. The
synchronization between name node and data node is done by heartbeats functions
which are periodically generated by data node to name node.
Apart from above components the job tracker and task trackers are used when
MapReduce application runs over the HDFS. Hadoop core consists of one master job
tracker and several task trackers. The job tracker runs on name node like a master while
task trackers runs on data nodes like slaves.
The job tracker is responsible for taking the requests from a client and assigning task
trackers to it with tasks to be performed. The job tracker always tries to assign tasks to
the task tracker on the data nodes where the data is locally present. If for some reason the
node fails the job tracker assigns the task to another task tracker where the replica of the
data exists since the data blocks are replicated across the data nodes. This ensures that the
job does not fail even if a node fails within the cluster.
5.3 Map Reduce

The MapReduce is a programming model provided by Hadoop that allows expressing
distributed computations on huge amount of data.it provides easy scaling of data
processing over multiple computational nodes or clusters. In MapReduce s model the
data processing primitives used are called mapper and reducer. Every MapReduce
program must have at least one mapper and reducer subroutines. The mapper has map
method that transforms input key value pair in to any number of intermediate key value
pairs while reducer has a reduce method that transform intermediate key value pairs that
aggregated in to any number of output key, value pairs.
The MapReduce keeps all processing operations separate for parallel executions where
a complex problem with extremely large in size is decomposed in to sub tasks. These
subtasks are executed independently from each other’s. After that the result of all
independent executions are combined together to get the complete output.
5.3.1 Features of MapReduce

The different features provided by MapReduce are explained as follows
 Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The synchronization
is provided by reading the state of each MapReduce operation during the execution
and uses shared variables for those.
®
 Data locality : In MapReduce although the data resides on different clusters, it

appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
 Error handling : MapReduce engine provides different fault tolerance mechanisms
in case of failure. When the tasks are running on different cluster nodes during
which if any failure occurs then MapReduce engine find out those incomplete tasks
and reschedule them for execution on different nodes.
 Scheduling : The MapReduce involves map and reduce operations that divide large
problems in to smaller chunks and those are run in parallel by different machines.so
there is a need to schedule different tasks on computational nodes on priority basis
which is taken care by MapReduce engine.
5.3.2 Working of MapReduce Framework

The unit of work in MapReduce is a job. During map phase the input data is divided
in to input splits for analysis where each split is an independent task. These tasks run in
parallel across Hadoop clusters. The reducer phase uses result obtained from mapper as
an input to generate the final result.
The MapReduce takes a set of input <key, value> pairs and produces a set of output
<key, value> pairs by supplying data through map and reduce functions. The typical
MapReduce operations are shown in Fig. 5.3.1.
Fig. 5.3.1 : MapReduce operations
Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.3.2 and explained as follows.
®
Cloud Computing 5 - 10 Cloud Technologies and Advancements
Fig. 5.3.2 : Different phases of execution in MapReduce
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept on
HDFS (Hadoop Distributed File System) store which has standard InputFormat specified
by user.
Once input file is selected then the split phase reads the input data and divided those
in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs. It reads
input data from split using record reader and generates intermediate results. It is used to
transform the input key, value list data to output key, value list which is then pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. Each partition is called subset. Each
subset becomes input to the reducer.in general shuffle phase ensures that the partitioned
splits reached at appropriate reducers where reducer uses http protocol to retrieve their
own partition from mapper.
®
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases occur
simultaneously where mapped output are being fetched and merged.
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format.
The final output of each MapReduce program is generated with key value pairs written in
output file which is written back to the HDFS store. In example of Word count process
using MapReduce with all phases of execution are illustrated in Fig. 5.3.3.
Fig. 5.3.3 : Word count process using MapReduce
5.4 Virtual Box

Virtual Box (formerly Sun Virtual Box and presently called Oracle VM Virtual Box) is
an x86 virtualization software package, created by software company Innotek GmbH,
purchased by Sun Microsystems, and now takeover by Oracle Corporation as part of its
family of virtualization products. It is cross-platform virtualization software that allows
users to extend their existing computer to run multiple operating systems at the same
time. VirtualBox runs on Microsoft Windows, Mac OS, Linux, and Solaris systems. It is
ideal for testing, developing, demonstrating, and deploying solutions across multiple
platforms on single machine.
It is a Type II (hosted) hypervisor that can be installed on an existing host operating
system as an application. This hosted application allows to run additional operating
systems inside it known as a Guest OS. Each guest OS can be loaded and run with its own
virtual environment. VirtualBox allows you to run guest operating systems using its own
®
virtual hardware. Each instance of guest OS

is called a “Virtual machine”. The functional
architecture of Virtual Box hypervisor is
It has lightweight, extremely fast and
powerful virtualization engine. The guest
system will run in its VM environment just
as if it were installed on a real computer. It
operates according to the VM settings you
Fig. 5.4.1 : Functional architecture of Virtual
have specified. All software that you choose
Box Hypervisor
to run on the guest system will operate just
as it would on a physical computer. Each VM runs over its independent virtualized
hardware.
The latest version of VirtualBox simplifies cloud deployment by allowing developers
to create multiplatform environments and to develop applications for Container and
Virtualization technologies within Oracle VM VirtualBox on a single machine. VirtualBox
also supports OS (.vmdk) and Virtual hard disk (.vhd) images made using VMware
Workstation or Microsoft Virtual PC, thus it can flawlessly run and integrate guest
machines which were configured via VMware Workstation or other hypervisors.
The VirtualBox provides the following main features
 It supports Fully Para virtualized environment along with Hardware virtualization.
 It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
 It provides shared folder support to copy data from host OS to guest OS and vice
versa.
 It has latest Virtual USB controller support.
 It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
 It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
 It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.
®
5.5 Google App Engine

Google App Engine (GAE) is a Platform-as-a-Service cloud computing model that
supports many programming languages. GAE is a scalable runtime environment mostly
devoted to execute Web applications. In fact, it allows developers to integrate third-party
frameworks and libraries with the infrastructure still being managed by Google. It allows
developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports
languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can
write their code and deploy it on available google infrastructure with the help of Software
Development Kit (SDK). In GAE, SDKs are required to set up your computer for
developing, deploying, and managing your apps in App Engine. GAE enables users to
run their applications on a large number of data centers associated with Google’s search
engine operations. Presently, Google App Engine uses fully managed, serverless platform
that allows to choose from several popular languages, libraries, and frameworks to
develop user applications and then uses App Engine to take care of provisioning servers
and scaling application instances based on demand. The functional architecture of the
Google cloud platform for app engine is shown in Fig. 5.5.1.
Fig. 5.5.1 : Functional architecture of the Google cloud platform for app engine
®
The infrastructure for google cloud is managed inside datacenter. All the cloud
services and applications on Google runs through servers inside datacenter. Inside each
data center, there are thousands of servers forming different clusters. Each cluster can run
multipurpose servers. The infrastructure for GAE composed of four main components
like Google File System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for
storing large amounts of data on google storage clusters. The MapReduce is used for
application program development with data processing on large clusters. Chubby is used
as a distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data. In this architecture, users can interact
with Google applications via the web interface provided by each application.
The GAE platform comprises five main components like
 Application runtime environment offers a platform that has built-in execution
engine for scalable web programming and execution.
 Software Development Kit (SDK) for local application development and
deployment over google cloud platform.
 Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based
on BigTable techniques.
 Admin console used for easy management of user application development and
resource management
 GAE web service for providing APIs and interfaces.
5.6 Programming Environment for Google App Engine

The Google provides programming support for its cloud environment, that is, Google
Apps Engine, through Google File System (GFS), Big Table, and Chubby. The following
sections provide a brief description about GFS, Big Table, Chubby and Google APIs.
5.6.1 The Google File System (GFS)

Google has designed a distributed file system, named GFS, for meeting its exacting
demands off processing a large amount of data. Most of the objectives of designing the
GFS are similar to those of the earlier designed distributed systems. Some of the
objectives include availability, performance, reliability, and scalability of systems. GFS
has also been designed with certain challenging assumptions that also provide
opportunities for developers and researchers to achieve these objectives. Some of the
assumptions are listed as follows :
®
a) Automatic recovery from component failure on a routine basis

b) Efficient storage support for large - sized files as a huge amount of data to be
processed is stored in these files. Storage support is provided for small - sized files
without requiring any optimization for them.
c) With the workloads that mainly consist of two large streaming reads and small
random reads, the system should be performance conscious so that the small reads
are made steady rather than going back and forth by batching and sorting while
advancing through the file.
d) The system supports small writes without being inefficient, along with the usual
large and sequential writes through which data is appended to files.
e) Semantics that are defined well are implemented.
f) Atomicity is maintained with the least overhead due to synchronization.
g) Provisions for sustained bandwidth is given priority rather than a reduced latency.
Google takes the aforementioned assumptions into consideration, and supports its
cloud platform, Google Apps Engine, through GFS. Fig. 5.6.1 shows the architecture of
the GFS clusters.
Fig. 5.6.1 : Architecture of GFS clusters
GFS provides a file system interface and different APIs for supporting different file
operations such as create to create a new file instance, delete to delete a file instance, open to
open a named file and return a handle, close to close a given file specified by a handle,
read to read data from a specified file and write to write data to a specified file.
®
It can be seen from Figure 5.6.1, that a single GFS Master and three chunk servers are
serving to two clients comprise a GFS cluster. These clients and servers, as well as the
Master, are Linux machines, each running a server process at the user level. These
processes are known as user-level server processes.
In GFS, the metadata is managed by the GFS Master that takes care of all the
communication between the clients and the chunk servers. Chunks are small blocks of
data that are created from the system files. Their usual size is 64 MB. The clients interact
directly with chunk servers for transferring chunks of data. For better reliability, these
chunks are replicated across three machines so that whenever the data is required, it can
be obtained in its complete form from at least one machine. By default, GFS stores three
replicas of the chunks of data. However, users can designate any levels of replication.
Chunks are created by dividing the files into fixed-sized blocks. A unique immutable
handle (of 64-bit) is assigned to each chunk at the time of their creation by the GFS
Master. The data that can be obtained from the chunks, the selection of which is specified
by the unique handles, is read or written on local disks by the chunk servers. GFS has all
the familiar system interfaces. It also has additional interfaces in the form of snapshots
and appends operations. These two features are responsible for creating a copy of files or
folder structure at low costs and for permitting a guaranteed atomic data-append
operation to be performed by multiple clients of the same file concurrently.
Applications contain a specific file system, Application Programming Interface (APIs)
that are executed by the code that is written for the GFS client. Further, the
communication with the GFS Master and chunk servers are established for performing
the read and write operations on behalf of the application. The clients interact with the
Master only for metadata operations. However, data-bearing communications are
forwarded directly to chunk servers. POSIX API, a feature that is common to most of the
popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-in is
not required. Clients or servers do not perform the caching of file data. Due to the
presence of the streamed workload, caching does not benefit clients, whereas caching by
servers has the least consequence as a buffer cache that already maintains a record for
frequently requested files locally.
The GFS provides the following features :
 Large - scale data processing and storage support
 Normal treatment for components that stop responding
®
 Optimization for large-sized files (mostly appended concurrently and read

sequentially)
 Fault tolerance by constant monitoring, data replication, and automatic recovering
 Data corruption detections at the disk or Integrated Development Environment
(IDE) subsystem level through the checksum method
 High throughput for concurrent readers and writers
 Simple designing of the Master that is centralized and not bottlenecked
GFS provides caching for the performance and scalability of a file system and logging
for debugging and performance analysis.
5.6.2 Big Table

Googles Big table is a distributed storage system that allows storing huge volumes of
structured as well as unstructured data on storage mediums. Google created Big Table
with an aim to develop a fast, reliable, efficient and scalable storage system that can
process concurrent requests at a high speed. Millions of users access billions of web pages
and many hundred TBs of satellite images. A lot of semi-structured data is generated
from Google or web access by users. This data needs to be stored, managed, and
processed to retrieve insights. This required data management systems to have very high
scalability.
Google's aim behind developing Big Table was to provide a highly efficient system for
managing a huge amount of data so that it can help cloud storage services. It is required
for concurrent processes that can update various data pieces so that the most recent data
can be accessed easily at a fast speed. The design requirements of Big Table are as
follows :
1. High speed
2. Reliability
3. Scalability
4. Efficiency
5. High performance
6. Examination of changes that take place in data over a period of time.
Big Table is a popular, distributed data storage system that is highly scalable and self-
managed. It involves thousands of servers, terabytes of data storage for in-memory
®
operations, millions of read/write requests by users in a second and petabytes of data

stored on disks. Its self-managing services help in dynamic addition and removal of
servers that are capable of adjusting the load imbalance by themselves.
It has gained extreme popularity at Google as it stores almost all kinds of data, such as
Web indexes, personalized searches, Google Earth, Google Analytics, and Google
Finance. It contains data from the Web is referred to as a Web table. The generalized
architecture of Big table is shown in Fig. 5.6.2.
Fig. 5.6.2 : Generalized architecture of Big table
It is composed of three entities, namely Client, Big table master and Tablet servers. Big
tables are implemented over one or more clusters that are similar to GFS clusters. The
client application uses libraries to execute Big table queries on the master server. Big table
is initially broken up into one or more slave servers called tablets for the execution of
secondary tasks. Each tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage
collections and monitoring the performance of tablet servers. The master server splits
tasks and executes them over tablet servers. The master server is also responsible for
maintaining a centralized view of the system to support optimal placement and load-
balancing decisions. It performs separate control and data operations strictly with tablet
servers. Upon granting the tasks, tablet servers provide row access to clients. Fig. 5.6.3
shows the structure of Big table :
®
Fig. 5.6.3 : Structure of Big table
Big Table is arranged as a sorted map that is spread in multiple dimensions and
involves sparse, distributed, and persistence features. The Big Table’s data model
primarily combines three dimensions, namely row, column, and timestamp. The first two
dimensions are string types, whereas the time dimension is taken as a 64-bit integer. The
resulting combination of these dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a
lexicological form. Although Big Table rows do not support the relational model, they
offer atomic access to the data, which means you can access only one record at a time. The
rows contain a large amount of data about a given entity such as a web page. The row
keys represent URLs that contain information about the resources that are referenced by
the URLs.
The naming conventions that are used for columns are more structured than those of
rows. Columns are organized into a number of column families that logically groups the
data under a family of the same type. Individual columns are designated by qualifiers
within families. In other words, a given column is referred to use the syntax column_
family: optional_ qualifier, where column_ family is a printable string and qualifier is an
arbitrary string. It is necessary to provide an arbitrary name to one level which is known
as a column family, but it is not mandatory to give a name to a qualifier. The column
family contains information about the data type and is actually the unit of access control.
®
Qualifiers are used for assigning columns in each row. The number of columns that can
be assigned in a row is not restricted.
The other important dimension that is assigned to Big Table is a timestamp. In Big
table, the multiple versions of data are indexed by timestamp for a given cell. The
timestamp is either related to real-time or can be an arbitrary value that is assigned by a
programmer. It is used for storing various data versions in a cell. By default, any new
data that is inserted into Big Table is taken as current, but you can explicitly set the
timestamp for any new write operation in Big Table. Timestamps provide the Big Table
lookup option that returns the specified number of the most recent values. It can be used
for marking the attributes of the column families. The attributes either retain the most
recent values in a specified number or keep the values for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of
operations such as metadata operations, read/write operations, or modify/update
operations. The commonly used operations by APIs are as follows:
 Creation and deletion of tables
 Creation and deletion of column families within tables
 Writing or deleting cell values
 Accessing data from rows
 Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
 Set () is used for writing cells in a row.
 DeleteCells () is used for deleting cells from a row.
 DeleteRow() is used for deleting the entire row, i.e., all the cells from a row are
deleted.
It is clear that Big Table is a highly reliable, efficient, and fan system that can be used
for storing different types of semi-structured or unstructured data by users.
5.6.3 Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities in
an asynchronous environment on a large scale. It is used as a name service within Google
and provides reliable storage for file systems along with the election of coordinator for
multiple replicas. The Chubby interface is similar to the interfaces that are provided by
®
distributed systems with advisory locks. However, the aim of designing Chubby is to
provide reliable storage with consistent availability. It is designed to use with loosely
coupled distributed systems that are connected in a high-speed network and contain
several small-sized machines. The lock service enables the synchronization of the
activities of clients and permits the clients to reach a consensus about the environment in
which they are placed. Chubby’s main aim is to efficiently handle a large set of clients by
providing them a highly reliable and available system. Its other important characteristics
that include throughput and storage capacity are secondary. Fig. 5.6.4 shows the typical
structure of a Chubby system :
Fig. 5.6.4 : Structure of a Chubby system
The chubby architecture involves two primary components, namely server and client
library. Both the components communicate through a Remote Procedure Call (RPC).
However, the library has a special purpose, i.e., linking the clients against the chubby cell.
A Chubby cell contains a small set of servers. The servers are also called replicas, and
usually, five servers are used in every cell. The Master is elected from the five replicas
through a distributed protocol that is used for consensus. Most of the replicas must vote
®
for the Master with the assurance that no other Master will be elected by replicas that
have once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is
simpler than the Unix one. The files and directories, known as nodes, are contained in the
Chubby namespace. Each node is associated with different types of metadata. The nodes
are opened to obtain the Unix file descriptors known as handles. The specifiers for
handles include check digits for preventing the guess handle for clients, handle sequence
numbers, and mode information for recreating the lock state when the Master changes.
Reader and writer locks are implemented by Chubby using files and directories. While
exclusive permission for a lock in the writer mode can be obtained by a single client, there
can be any number of clients who share a lock in the reader’s mode. The nature of locks is
advisory, and a conflict occurs only when the same lock is requested again for an
acquisition. The distributed locking mode is complex. On one hand, its use is costly, and
on the other hand, it only permits numbering the interactions that are already using locks.
The status of locks after they are acquired can be described using specific descriptor
strings called sequencers. The sequencers are requested by locks and passed by clients to
servers in order to progress with protection.
Another important term that is used with Chubby is an event that can be subscribed
by clients after the creation of handles. An event is delivered when the action that
corresponds to it is completed. An event can be :
a. Modification in the contents of a file
b. Addition, removal, or modification of a child node
c. Failing over of the Chubby Master
d. Invalidity of a handle
e. Acquisition of lock by others
f. Request for a conflicting lock from another client
In Chubby, caching is done by a client that stores file data and metadata to reduce the
traffic for the reader lock. Although there is a possibility for caching of handles and files
locks, the Master maintains a list of clients that may be cached. The clients, due to
caching, find data to be consistent. If this is not the case, an error is flagged. Chubby
maintains sessions between clients and servers with the help of a keep-alive message,
which is required every few seconds to remind the system that the session is still active.
Handles that are held by clients are released by the server in case the session is overdue
for any reason. If the Master responds late to a keep-alive message, as the case may be, at
®
times, a client has its own timeout (which is longer than the server timeout) for the
detection of the server failure.
If the server failure has indeed occurred, the Master does not respond to a client about
the keep-alive message in the local lease timeout. This incident sends the session in
jeopardy. It can be recovered in a manner as explained in the following points:
 The cache needs to be cleared.
 The client needs to wait for a grace period, which is about 45 seconds.
 Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy
is over. However, if this attempt fails, the client assumes that the session is lost. Fig. 5.6.5
shows the case of the failure of the Master :
Fig. 5.6.5 : Case of failure of Master server
Chubby offers a decent level of scalability, which means that there can be any
(unspecified) number of the Chubby cells. If these cells are fed with heavy loads, the lease
timeout increases. This increment can be anything between 12 seconds and 60 seconds.
The data is fed in a small package and held in the Random-Access Memory (RAM) only.
The Chubby system also uses partitioning mechanisms to divide data into smaller
packages. All of its excellent services and applications included, Chubby has proved to
be a great innovation when it comes to storage, locking, and program support services.
The Chubby is implemented using the following APls :
1. Creation of handles using the open() method
2. Destruction of handles using the close() method
The other important methods include GetContentsAndStat(), GetStat(), ReadDir(),
SetContents(), SetACl(), Delete(), Acquire(), TryAcquire(), Release(), GetSequencer(),
SetSequencer(), and CheckSequencer(). The commonly used APIs in chubby are listed in
Table 5.6.1 :
®
API Description
Open Opens the file or directory and returns a handle
Close Closes the file or directory and returns the associated handle
Delete Deletes the file or directory
ReadDir Returns the contents of a directory
SetContents Writes the contents of a file
GetStat Returns the metadata
GetContentsAndStat Writes the file contents and return metadata associated with the file
Acquire Acquires a lock on a file
Release Releases a lock on a file
Table 5.6.1 : APIs in Chubby
5.6.4 Google APIs

Google developed a set of Application Programming Interfaces (APIs) that can be used
to communicate with Google Services. This set of APIs is referred as Google APIs. and
their integration to other services. They also help in integrating Google Services to other
services. Google App Engine help in deploying an API for an app while not being aware
about its infrastructure. Google App Engine also hosts the endpoint APIs which are
created by Google Cloud Endpoints. A set of libraries, tools, and capabilities that can be
used to generate client libraries and APIs from an App Engine application is known as
Google Cloud Endpoints. It eases the data accessibility for client applications. We can also
save the time of writing the network communication code by using Google Cloud
Endpoints that can also generate client libraries for accessing the backend API.
5.7 Open Stack

OpenStack is an open - source cloud operating system that is increasingly gaining
admiration among data centers. This is because OpenStack provides a cloud computing
platform to handle enormous computing, storage, database and networking resources in
a data center. In simple way we can say, OpenStack is an opensource highly scalable
cloud computing platform that provides tools for developing private, public or hybrid
clouds, along with a web interface for users to access resources and admins to manage
those resources.
®
Put otherwise, OpenStack is a platform that enables potential cloud providers to

create, manage and bill their custom-made VMs to their future customers. OpenStack is
free and open, which essentially means that everyone can have access to its source code
and can suggest or make changes to it and share it with the OpenStack community.
OpenStack is an open-source and freely available cloud computing platform that enables
its users to create, manage and deploy virtual machines and other instances. Technically,
OpenStack provides Infrastructure-as-a-Service (IaaS) to its users to enable them to
manage virtual private servers in their data centers.
OpenStack provides the required software tools and technologies to abstract the
underlying infrastructure to a uniform consumption model. Basically, OpenStack allows
various organisations to provide cloud services to the user community by leveraging the
organization’s pre-existing infrastructure. It also provides options for scalability so that
resources can be scaled whenever organisations need to add more resources without
hindering the ongoing processes.
The main objective of OpenStack is to provide a cloud computing platform that is :
 Global
 Open-source
 Freely available
 Easy to use
 Highly and easily scalable
 Easy to implement
 Interoperable
OpenStack is for all. It satisfies the needs of users, administrators and operators of
private clouds as well as public clouds. Some examples of open-source cloud platforms
already available are Eucalyptus, OpenNebula, Nimbus, CloudStack and OpenStack,
which are used for infrastructure control and are usually implemented in private clouds.
5.7.1 Components of OpenStack

OpenStack consists of many different components. Because OpenStack cloud is open -
source, developers can add components to benefit the OpenStack community. The
following are the core components of OpenStack as identified by the OpenStack
community:
 Nova : This is one of the primary services of OpenStack, which provides numerous
tools for the deployment and management of a large number of virtual machines.
Nova is the compute service of OpenStack.
®
 Swift : Swift provides storage services for storing files and objects. Swift can be
equated with Amazon’s Simple Storage System (S3).
 Cinder : This component provides block storage to Nova Virtual Machines. Its
working is similar to a traditional computer storage system where the computer is
able to access specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
 Glance : Glace is OpenStack’s image service component that provides virtual
templates (images) of hard disks. These templates can be used for new VMs. Glance
may use either Swift or flat files to store these templates.
 Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It
also ensures communication between other components.
 Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in
files.
 Keystone : This component provides identity management in OpenStack
 Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
 Ceilometer : This component of OpenStack provisions meters and billing models
for users of the cloud services. It also keeps an account of the resources used by
each individual user of the OpenStack cloud. Let us also discuss some of the non-
core components of OpenStack and their offerings.
 Trove : Trove is a component of OpenStack that provides Database-as-a- service. It
provisions relational databases and big data engines.
 Sahara : This component provisions Hadoop to enable the management of data
processors.
 Zaqar : This component allows messaging between distributed application
components.
 Ironic : Ironic provisions bare-metals, which can be used as a substitute to VMs.
The basic architectural components of OpenStack, shown in Fig. 5.7.1, includes its core
and optional services/ components. The optional services of OpenStack are also known as
Big Tent services, and OpenStack can be used without these components or they can be
used as per requirement.
®
Fig. 5.7.1 : Components of open stack architecture

We have already discussed the core services and the four optional services. Let us now
discuss the rest of the services.
 Designate : This component offers DNS services analogous to Amazon’s Route 53.
The following are the subsystems of Designate :
o Mini DNS Server
o Pool Manager
o Central Service and APIs
 Barbican : Barbican is the key management service of OpenStack that is comparable
to KMS from AWS. This provides secure storage, retrieval, and provisioning and
management of various types of secret data, such as keys, certificates, and even
binary data.
 AMQP : AMQP stands for Advanced Message Queue Protocol and is a messaging
mechanism used by OpenStack. The AQMP broker lies between two components of
Nova and enables communication in a slackly coupled fashion.
Further, OpenStack uses two architectures - Conceptual and Logical, which are
discussed in the next section.
5.7.2 Features and Benefits of OpenStack

OpenStack helps build cloud environments by providing the ability to integrate various
technologies of your choice. Apart from the fact that OpenStack is open-source, there are
numerous benefits that make it stand out. Following are some of the features and benefits
of OpenStack Cloud :
®
 Compatibility : OpenStack supports both private and public clouds and is very easy
to deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling
easy portability between public and private clouds.
 Security : OpenStack addresses the security concerns, which are the topmost
concerns for most organisations, by providing robust and reliable security systems.
 Real-time Visibility : OpenStack provides real-time client visibility to
administrators, including visibility of resources and instances, thus enabling
administrators and providers to track what clients are requesting for.
 Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems,
which resulted in loss of performance. Now, OpenStack has enabled upgrading
systems while they are running by requiring only individual components to shut-
down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.
5.7.3 Conceptual OpenStack Architecture

Fig. 5.7.2, depicting a magnified version of the architecture by showing relationships
among different services and between the services and VMs. This expanded
representation is also known as the Conceptual architecture of OpenStack.
Fig. 5.7.2 : Conceptual architecture of OpenStack
®
From Fig. 5.7.2, we can see that every service of OpenStack depends on other services
within the systems, and all these services exist in a single ecosystem working together to
produce a virtual machine. Any service can be turned on or off depending on the VM
required to be produced. These services communicate with each other through APIs and
in some cases through privileged admin commands.
Let us now discuss the relationship between various components or services specified
in the conceptual architecture of OpenStack. As you can see in Figure 4.2, three
components, Keystone, Ceilometer and Horizon, are shown on top of the OpenStack
platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to
the user by mapping the central directory of users to the accessible OpenStack services,
and Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack
platform, you can see that various processes are handled by different OpenStack services;
Glance is registering Hadoop images, providing image services to OpenStack and
allowing retrieval and storage of disk images. Glance stores the images in Swift, which is
responsible for providing reading service and storing data in the form of objects and files.
All other OpenStack components also store data in Swift, which also stores data or job
binaries. Cinder, which offers permanent block storage or volumes to VMs, also stores
backup volumes in Swift. Trove stores backup databases in Swift and boots databases
instances via Nova, which is the main computing engine that provides and manages virtual
machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic
that fetches images via Glance. VMs are used by the users or administrators to avail and
provide the benefits of cloud services. All the OpenStack services are used by VMs in
order to provide best services to the users. The infrastructure required for running cloud
services is managed by Heat, which is the orchestration component of OpenStack that
orchestrates clusters and stores the necessarys resource requirements of a cloud
application. Here, Sahara is used to offer a simple means of providing a data processing
framework to the cloud users.
Table 5.7.1 shows the dependencies of these services.
®
Code Name Dependent on Optional

Nova (Compute) Keystone, Horizon, Glance Cinder, Neutron
Swift (Object Storage) Keystone -
Cinder (Block Storage) Keystone -
Glance (Image Service) Swift, Keystone, Horizon -
Neutron (Network) Keystone, Nova -
Keystone (Identity) - -
Horizon (Dashboard) Keystone -
Table 5.7.1 : Service Dependencies
5.7.4 Modes of Operations of OpenStack

OpenStack majorly operates in two modes - single host and multi host. A single host
mode of operation is that in which the network services are based on a central server,
whereas a multi host operation mode is that in which each compute node has a duplicate
copy of the network running on it and the nodes act like Internet gateways that are
running on individual nodes. In addition to this, in a multi host operation mode, the
compute nodes also individually host floating IPs and security groups. On the other
hand, in a single host mode of operation, floating IPs and security groups are hosted on
the cloud controller to enable communication.
Both single host and multi host modes of operations are widely used and have their
own set of advantages and limitations. A single host mode of operation has a major
limitation that if the cloud controller goes down, it results in the failure of the entire
system because instances stop communicating. This is overcome by a multi host operation
mode where a copy of the network is provisioned to every node. Whereas, this limitation
is overcome by the multi host mode, which requires a unique public IP address for each
compute node to enable communication. In case public IP addresses are not available,
using the multi host mode is not possible.
5.8 Federation in the Cloud

As many of the cloud computing environments present difficulties in creating and
managing decentralized provisioning of cloud services along with maintaining consistent
connectivity between untrusted components and fault - tolerance. Therefore, to overcome
®
such challenges the federated cloud ecosystem is introduced by associating multiple

cloud computing providers using a common standard. Cloud federation includes services
from different providers aggregated in a single pool supporting three essential
interoperability features like resource redundancy, resource migration, and combination
of complementary resources respectively. It allows an enterprise to distribute workload
around the globe and move data between desperate networks and implement innovative
security models for user access to cloud resources. In federated clouds, the cloud
resources are provisioned through network gateways that connect public or external
clouds with private or internal clouds owned by a single entity and/or community clouds
owned by several co-operating entities.
A popular project on identity management for federated cloud is conducted by
Microsoft, called the Geneva Framework. The Geneva framework was principally
centered around claim based access where claims describe the identity attributes, Identity
Metasystem characterizes a single identity model for the enterprise and federation, and
Security Token Services (STS) are utilized in the Identity Metasystem to assist with user
access management across applications regardless of location or architecture.
In this section we are going to see the federation in cloud by using IETF (Internet
Engineering Task Force) standard protocols for interdomain federation called Jabber XCP
(Jabber Extensible Communications Platform) and XMPP (Extensible Messaging and
Presence Protocol) which are experimented by many popular companies like Google,
Facebook, Twitter, etc. for cloud federation.
1. Jabber XCP
Instant Messaging (IM) allows users to exchange messages that are delivered
synchronously. As long as the recipient is connected to the service, the message will be
pushed to it directly. This can either be realized using a centralized server or peer to peer
connections between each client.
The Jabber Extensible Communications Platform (Jabber XCP) is a commercial IM
server, created by the Cisco in association with Sun Microsystems. It is a highly
programmable presence and messaging platform. It supports the exchange of
information between applications in real time. It supports multiple protocols such as
Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions
(SIMPLE) and Instant Messaging and Presence Service (IMPS). It is a highly
programmable platform and scalable solution, which makes it ideal for adding presence
and messaging to existing applications or services and for building next-generation,
presence - based solutions.
®
2. XMPP (Extensible Messaging and Presence Protocol)

The Extensible Messaging & Presence Protocol (XMPP) is an open standard for instant
messaging. It is published in Request For Comments (RFCs) by the Internet Engineering
Task Force (IETF) and can be used freely. The protocol messages are formatted using
XML, which allows extending the messages with additional XML formatted information.
It was previously known as Jabber. Although in principle a protocol does not dictate the
underlying architecture, XMPP is a single-endpoint protocol. XMPP clients connect to one
machine of the architecture and transmit and receive both instant messages and presence
updates. In XMPP, users are identified by a username and domain name. Users can be
connected multiple times using the same username, which allows to have an IM&P
connection at work while the client at home is still running. Each connection is identified
by a unique resource, which is combined with the username and domain name to yield a
unique Jabber Identifier (JID).
In cloud architectures, web services play an important role in provisioning the
resources and services. But the protocols used by current cloud services like, SOAP
(Simple Object Access Protocol) or others are assorted HTTP-based protocols. These
protocols can only perform one-way information exchanges. Due to that the cloud
services possess challenges related to scalability, real-time communication and bypassing
firewall rules. Therefore, in search of solution many researchers have found XMPP (also
called Jabber) as the convenient protocol that can overcome those barriers and can be
used effectively with the cloud solutions. Many of cloud pioneers’ companies like Google,
IBM, Apple, and so on have already incorporated this protocol into their cloud-based
solutions in the last few years.
The XMPP is advantageous and best match for cloud computing because of following
consummate benefits :
a. It is decentralized and supports easy two-way communication
b. It doesn’t require polling for synchronization
c. It has built-in publish subscribe (pub-sub) functionality
d. It works on XML based open standards
e. It is perfect for Instant Messaging features and custom cloud services
f. It is efficient and scales up to millions of concurrent users on a single service
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL)
i. It is flexible and extensible.
®
In current scenario, XMPP and XCP are extensively used for federation in the cloud
due to its unique capabilities which were never before. The next sections of this chapter
explain the levels of federations along with their applications and services.
5.9 Four Levels of Federation

In a real-time, federation defines the way how XMPP servers in different domains
exchange the XML based messages. As per XEP-0238, the XMPP Protocol in Inter-Domain
Federation, has four basic levels of federation which are shown in Fig. 5.9.1 and explained
as follows.
Fig. 5.9.1 : Four levels of federation
Level 1 : Permissive federation

The Permissive federation is the lowest level of federation where server accepts a
connection from a peer network server without confirming its identity using DNS
lookups or certificate checking. To run permissive federation, there is no minimum
criteria. The absence of verification or validation (authentication) may lead to domain
spoofing (the unapproved utilization of third-party domain name in an email message so
as to profess to be another person). It has least security mechanisms due to which it
makes the way for widespread spam and other abuses. Initially, permissive federation
was the only solution to work with web applications but with the arrival of the open
source Jabberd 1.2, the permissive federation becomes obsolete on the XMPP network.
Level 2 : Verified federation
The verified federation works at level 2, runs above the permissive federation. In
this level, the server accepts a connection from a peer network server only when the
identity of peer is verified or validated. Peer verification is the minimum criteria to run
®
verified federation. It utilizes information acquired by DNS for domain-specific keys

exchange in advance. In this type, the connection isn't encrypted but because of identity
verification it effectively prevents the domain spoofing. To make this work effectively,
the federation requires appropriate DNS setup but still it is prone to DNS poisoning
attacks. The verified federation has been the default service approach on the open XMPP
since the arrival of the open - source Jabberd 1.2 server. It act as a foundation for
encrypted federation.
Level 3 : Encrypted federation

The Encrypted federation is the third level of federation runs above verified
federation. In this level, a server accepts a connection from peer network servers if and
only if peer supports the Transport Layer Security (TLS) as characterized by XMPP in
RFC 3920. The Transport Layer Security is the minimum criteria to run encrypted
federation.
The TLS is the advancement to Secure Sockets Layer (SSL) which was developed to
make secure communications over HTTP. XMPP uses a TLS profile that enables two
entities to upgrade/convert a connection from unencrypted to encrypted. TLS is mainly
used for channel encryption. In encrypted federation, the peer must possess a self-signed
digital certificate verified by identity provider. However, this prevents mutual
authentication. If so, both parties proceed to weakly confirm identity utilizing Server
Dialback protocol. The XEP-0220 characterizes the Server Dialback protocol, which is
utilized between XMPP servers to give identity check. For that, Server Dialback utilizes
the DNS as the basis for verifying identity; the fundamental methodology is that when a
receiving server gets a server-to-server connection request from a beginning server. It
doesn't acknowledge the request until it has checked a key with a authoritative server for
the domain affirmed by the originating server. Despite the fact that Server Dialback
doesn't give strong authentication or trusted federation, and in spite of the fact that it is
liable to DNS poisoning attacks, it has successfully prevented most instances of address
spoofing on the XMPP network. This outcomes in an encrypted connection with weak
identity verification. Here certificates are signed by the server itself.
Level 4 : Trusted federation

The Encrypted federation is the top most level of federation runs above encrypted
federation. In this level, a server accepts a connection from peer network server if and
only if the peer supports TLS and the peer can present a digital certificate issued by a root
Certification Authority (CA) that is trusted by the authenticating server. The use of digital
certificates in trusted federation results in providing strong authentication and channel
®
encryption. The trusted root CAs are identified based on one or more factors like their
operating system environment, XMPP server software, or local service policy. The
utilization of trusted domain certificates prevents DNS poisoning attacks but makes
federation more difficult. Under such circumstances, the certificates are difficult to obtain.
Here certificates are signed by CA.
5.10 Federated Services and Applications

The server to server federation is required to head toward building a constant real -
time communication in the cloud. The cloud typically comprises the considerable number
of clients, devices, services, and applications connected to the network. So as to
completely use the capacities of this cloud structure, a member needs the capacity to
discover different entities of interest. Such entities may be end users, real-time content
feeds, user directories, messaging gateways, and so on. The XMPP utilizes service
discovery to find the stated entities. The discovery protocol empowers any network
participant to inquiry another entity with respect to its identity, capacities, and related
substances. At any point, when a participant connects to the network, it inquiries the
authoritative server for its specific domain about the entities associated with that
authoritative server. In response to a service discovery query, the authoritative server
instructs the inquirer about services hosted there and may likewise detail services that are
accessible however hosted somewhere else. XMPP incorporates a technique for keeping
up lists of other entities, known as roster technology, which empowers end clients to
monitor different kinds of entities. For the most part, these lists are comprised of other
entities the users are interested in or collaborate with regularly. Most XMPP
arrangements incorporate custom directories with the goal that internal users of those
services can easily find what they are searching for.
Some organizations are wary of federation because they fear that real-time
communication networks will introduce the same types of problems that are endemic to
email networks, such as spam and viruses. While these concerns are not unfounded, they
tend to be exaggerated for several reasons:
Some of the organizations are careful about federation since they fear that real - time
communication may leads to vulnerabilities in email networks, such as spam and viruses.
While these concerns are not unfounded because of some technical strengths in
federation. As XMPP learned from past problems, it helps email systems to prevent
address spoofing, inline scripts, unlimited binary attachments, and other attack tactics.
The use of point - to - point federation can avoid problem that occurred in traditional
multi - hop federation as it can restricts injection attacks, data loss, and unencrypted
®
intermediate links. With federation, the email network can ensures encrypted connections
and strong authentication because of using certificates issued by trusted root CAs.
5.11 The Future of Federation

The accomplishment of federated communications is an antecedent to build a
consistent cloud that can interact with individuals, devices, information feeds,
documents, application interfaces and so on. The intensity of a federated, presence-
enabled communications infrastructure is that it empowers software developers and
service providers to build and deploy such applications without asking consent from a
large, centralized communications operator. The procedure of server-to-server federation
with the end goal of interdomain communication has assumed an enormous role in the
success of XMPP, which depends on a small set of simple but incredible mechanisms for
domain checking and security to produce verified, encrypted, and trusted connections
between any two deployed servers. These mechanisms have given a steady, secure
establishment for development of the XMPP network and similar real - time technologies.
Summary
 The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using
programming models.
 The Hadoop core is divided into two fundamental layers called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data
storage manager.
 The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves.
 In MapReduce s model the data processing primitives used are called mapper
and reducer. The mapper has map method that transforms input key value pair
in to any number of intermediate key value pairs while reducer has a reduce
method that transform intermediate key value pairs that aggregated in to any
number of output key, value pairs.
 VirtualBox is a Type II (hosted) hypervisor that runs on Microsoft Windows,
Mac OS, Linux, and Solaris systems. It is ideal for testing, developing,
demonstrating, and deploying solutions across multiple platforms on single
machine using VirtualBox.
®
 Google App Engine (GAE) is a Platform-as-a-Service cloud computing model

that supports many programming languages like Java, Python, .NET, PHP,
Ruby, Node.js and Go in which developers can write their code and deploy it on
available google infrastructure with the help of Software Development Kit
(SDK).
 The Google provides programming support for its cloud environment through
Google File System (GFS), Big Table, and Chubby.
 The GFS is used for storing large amounts of data on google storage clusters,
BigTable offers a storage service for accessing structured as well as unstructured
data while Chubby is used as a distributed application locking service.
 OpenStack is an opensource highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web
interface for users to access resources and admins to manage those resources.
 Openstack architecture has many components to manage compute storage,
network and security services.
 Cloud federation includes services from different providers aggregated in a
single pool supporting three essential interoperability features like resource
redundancy, resource migration, and combination of complementary resources
respectively.
 The Jabber XCP and XMPP (Extensible Messaging and Presence Protocol) are
two popular protocols used in federation of cloud by companies like Google,
Facebook, Twitter, etc.
 There are four levels of federations namely permissive, verified federation,
encrypted and trusted federation where trusted is most secured federation while
permissive is least secured federation.
Q.1 What are the advantages of using Hadoop ? AU : Dec.-16

Ans. : The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming
models. It is designed to scale up from a single server to thousands of machines, with a
very high degree of fault tolerance.
The advantages of Hadoop are listed as follows :
a. Hadoop is a highly scalable in nature for data storage and processing platforms.
b. It satisfies all four characteristics of big data like volume, velocity, variety and
veracity.
®
c. It is a cost-effective solution for Big data applications as it uses a cluster of

commodity hardware to store data.
d. It provides high throughput and low latency for high computational jobs.
e. It is highly fault tolerance in nature along with features like self-healing and
replication. It automatically replicates data if server or disk get crashed.
f. It is flexible in nature as it supports different file formats of data like structured,
unstructured and semi structured.
g. It provides faster execution environment for big data applications.
h. It provides support for business intelligence by querying, reporting, searching,
filtering, indexing, aggregating the datasets.
i. It provides tools for report generation, trend analysis, search optimization, and
information retrieval.
j. It supports different types of analytics like predictive, prescriptive and descriptive
along with functions like as document indexing, concept filtering, aggregation,
transformation, semantic text analysis, pattern recognition, and searching.
Q.2 What is the purpose of heart beat in Hadoop. AU : Dec.-17
OR State the significance of heart beat message in Hadoop. AU : Dec.-19
OR Give the significance of heart beat message in Hadoop. AU : May-19

Ans. : In Hadoop Name node and data node does communicate using Heartbeat. The
Heartbeat is the signal that is sent by the data node to the name node in regular time
interval to indicate its presence, i.e. to indicate that it is alive. Data Nodes send a
heartbeat signal to the Name Node every three seconds by default. The working of
Heartbeat is shown in Fig. 5.1.
Fig. 5.1 : Heartbeat in HDFS
®
If after a certain time of heartbeat (which is ten minutes by default), Name Node
does not receive any response from Data Node, then that particular Data Node used to
be declared as dead. If the death of a node causes the replication factor of data blocks to
drop below their minimum value, the Name Node initiates additional replication to
normalized state.
Q.3 Name the different modules in Hadoop framework. AU : May-17
Ans. : The Hadoop core is divided into two fundamental modules called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data storage
manager. Apart from that there are several other modules in Hadoop, used for data
storage, processing and analysis which are listed below :
a. HBase : Column-oriented NOSQL database service
b. Pig : Dataflow language and provides parallel data processing framework
c. Hive : Data warehouse infrastructure for big data
d. Scoop : Transferring bulk data between Hadoop and structured data stores
e. Oozie : Workflow scheduler system
f. Zookeeper : Distributed synchronization and coordination service
g. Mahaout : Machine learning tool for big data.
Q.4 “HDFS” is fault tolerant. Is it true ? Justify your answer. AU : Dec.-17
Ans. : Fault tolerance refers to the ability of the system to work or operate
uninterrupted even in case of unfavorable conditions (like components failure due to
disaster or by any other reason). The main purpose of this fault tolerance is to remove
frequently taking place failures, which occurs commonly and disturbs the ordinary
functioning of the system. The three main solutions which are used to produce fault
tolerance in HDFS are data replication, heartbeat messages and checkpoint and
recovery.
In data replication, The HDFS stores multiple replicas of same data across different
clusters based on replication factor. HDFS uses an intelligent replica placement model
for reliability and performance. The same copy of data is positioned on several different
computing nodes so when that data copy is needed it is provided by any of the data
node. major advantage of using this technique is to provide instant recovery from node
and data failures. But one main disadvantage is it consume high memory in storing the
same data on multiple nodes.
®
In heartbeat messages, the message is sent by the data node to the name node in
regular time interval to indicate its presence, i.e. to indicate that it is alive. If after a
certain time of heartbeat, Name Node does not receive any response from Data Node,
then that particular Data Node used to be declared as dead. In that case, the replication
node is considered as a primary data node to recover the data.
In checkpoint and recovery, similar concept as that of rollback is used to tolerate
faults up to some point. After a fixed length of time interval, the copy report has been
saved and stored. It just rollbacks to the last save point when the failure occurs and
then it starts performing transaction again.
Q.5 How does divide and conquer strategy related to MapReduce paradigm ?
AU : May-18
Ans. : In divide and conquer strategy a computational problem is divided into smaller
parts and execute them independently
until all parts gets completed and then
combining them to get a desired
solution of that problem.
The MapReduce takes a set of input
<key, value> pairs and produces a set
of output <key, value> pairs by
supplying data through map and
Fig. 5.2 : MapReduce operations
reduce functions. The typical
MapReduce operations are shown in Fig. 5.2.
In MapReduce, Mapper uses divide approach where input data gets splitted into
blocks; each block is represented as an input key and value pair. The unit of work in
MapReduce is a job. During map phase the input data is divided in to input splits for
analysis where each split is an independent task. These tasks run in parallel across
Hadoop clusters. A map function is applied to each input key/value pair, which does
some user-defined processing and emits new key/value pairs to intermediate storage
to be processed by the reducer. The reducer uses conquer approach for combining the
results. The reducer phase uses result obtained from mapper as an input to generate the
final result. A reduce function is applied on to mappers output in parallel to all values
corresponding to each unique map key and generates a single output key/value pair.
Q.6 How MapReduce framework executes user jobs ? AU : Dec.-18
Ans. : The unit of work in MapReduce is a job. During map phase the input data is
divided in to input splits for analysis where each split is an independent task. These
®
tasks run in parallel across Hadoop

clusters. The reducer phase uses result
obtained from mapper as an input to
generate the final result. The
MapReduce takes a set of input <key,
value> pairs and produces a set of
output <key, value> pairs by supplying
data through map and reduce
functions. The typical MapReduce
Fig. 5.3 : MapReduce operations
operations are shown in Fig. 5.3.
Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework.
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.
Once input file is selected then the split phase reads the input data and divided
those in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. The sort phase is responsible for
sorting the intermediate keys on single node automatically before they are presented to
the reducer. The shuffle and sort phases occur simultaneously where mapped output
are being fetched and merged.
values. The reducer uses sorted input to generate the final output. The final output of
each MapReduce program is generated with key value pairs written in output file
which is written back to the HDFS store.
Q.7 What is Map - reduce ? enlist the features of Map - reduce framework.
Ans. : The Map - reduce is a programming model provided by Hadoop that allows
expressing distributed computations on huge amount of data.it provides easy scaling of
data processing over multiple computational nodes or clusters.
®
Features of MapReduce
The different features provided by MapReduce are explained as follows :
 Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The
synchronization is provided by reading the state of each MapReduce operation
during the execution and uses shared variables for those.
 Data locality : In MapReduce although the data resides on different clusters, it
appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
 Error handling : MapReduce engine provides different fault tolerance
mechanisms in case of failure. When the tasks are running on different cluster
nodes during which if any failure occurs then MapReduce engine find out those
incomplete tasks and reschedule them for execution on different nodes.
 Scheduling : The MapReduce involves map and reduce operations that divide
large problems in to smaller chunks and those are run in parallel by different
machines.so there is a need to schedule different tasks on computational nodes on
priority basis which is taken care by MapReduce engine.
Q.8 Enlist the features of Virtual Box.
Ans. : The VirtualBox provides the following main features
 It supports Fully Para virtualized environment along with Hardware
virtualization.
 It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
 It provides shared folder support to copy data from host OS to guest OS and vice
versa.
 It has latest Virtual USB controller support.
 It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
 It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
 It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.
Q.9 Describe google app engine.
®
Ans. : Google App Engine (GAE) is a platform-as-a-service cloud computing model

that supports many programming languages. GAE is a scalable runtime environment
mostly devoted to execute Web applications. In fact, it allows developers to integrate
third-party frameworks and libraries with the infrastructure still being managed by
Google. It allows developers to use readymade platform to develop and deploy web
applications using development tools, runtime engine, databases and middleware
solutions. It supports languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in
which developers can write their code and deploy it on available google infrastructure
with the help of Software Development Kit (SDK). In GAE, SDKs are required to set up
your computer for developing, deploying, and managing your apps in App Engine.
Q.10 What are the core components of Google app engine architecture ?
Ans. : The infrastructure for GAE composed of four main components like Google File
System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for storing large
amounts of data on google storage clusters. The MapReduce is used for application
program development with data processing on large clusters. Chubby is used as a
distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data.
Q.11 Enlist the advantages of GFS.
Ans. : Google has designed a distributed file system, named GFS, for meeting its
exacting demands off processing a large amount of data. GFS provides a file system
interface and different APIs for supporting different file operations such as create to
create a new file instance, delete to delete a file instance, open to open a named file and
return a handle, close to close a given file specified by a handle, read to read data from a
specified file and write to write data to a specified file.
The advantages of GFS are :
a. Automatic recovery from component failure on a routine basis.
b. Efficient storage support for large - sized files as a huge amount of data to be
processed is stored in these files. Storage support is provided for small - sized files
without requiring any optimization for them.
c. With the workloads that mainly consist of two large streaming reads and small
random reads, the system should be performance conscious so that the small
reads are made steady rather than going back and forth by batching and sorting
while advancing through the file.
d. The system supports small writes without being inefficient, along with the usual
large and sequential writes through which data is appended to files.
®
e. Semantics that are defined well are implemented.

f. Atomicity is maintained with the least overhead due to synchronization.
g. Provisions for sustained bandwidth is given priority rather than a reduced latency.
Q.12 What is the role of chubby in Google app engine ?
Ans. : Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities
in an asynchronous environment on a large scale. It is used as a name service within
Google and provides reliable storage for file systems along with the election of
coordinator for multiple replicas. The Chubby interface is similar to the interfaces that
are provided by distributed systems with advisory locks. However, the aim of
designing Chubby is to provide reliable storage with consistent availability.
Q.13 What is Openstack ? Enlist its important components.
Ans. : OpenStack is an open source highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web
interface for users to access resources and admins to manage those resources.
The different components of Openstack architecture are :
a. Nova (Compute)
b. Swift (Object storage)
c. Cinder (Block level storage)
d. Neutron (Networking)
e. Glance (Image Management)
f. Keystone (Key management)
g. Horizon (Dashboard)
h. Ceilometer (Metering)
i. Heat (Orchestration)
Q.14 Explain the term “federation in the cloud”
Ans. : As many of the cloud computing environments present difficulties in creating
and managing decentralized provisioning of cloud services along with maintaining
consistent connectivity between untrusted components and fault-tolerance. Therefore,
to overcome such challenges the federated cloud ecosystem is introduced by associating
multiple cloud computing providers using a common standard. Cloud federation
includes services from different providers aggregated in a single pool supporting three
essential interoperability features like resource redundancy, resource migration, and
combination of complementary resources respectively. It allows an enterprise to
®
distribute workload around the globe and move data between desperate networks and
implement innovative security models for user access to cloud resources. In federated
clouds, the cloud resources are provisioned through network gateways that connect
public or external clouds with private or internal clouds owned by a single entity
and/or community clouds owned by several co-operating entities.
Q.15 Mention the importance of Transport Level Security (TLS). AU : Dec.-16
Ans. : The Transport Layer Securities (TLS) are designed to provide security at the
transport layer. TLS was derived from a security protocol called Secure Service Layer
(SSL). TLS ensures that no third party may eavdrops or tamper with any message.
The benefits of TLS are :
a. Encryption : TLS/SSL can help to secure transmitted data using encryption.
b. Interoperability : TLS/SSL works with most web browsers, including Microsoft
Internet Explorer and on most operating systems and web servers.
c. Algorithm Flexibility : TLS/SSL provides operations for authentication
mechanism, encryption algorithms and hashing algorithm that are used during
the secure session.
d. Ease of Deployment : Many applications TLS/SSL temporarily on a windows
server 2003 operating systems.
e. Ease of Use : Because we implement TLS/SSL beneath the application layer, most
of its operations are completely invisible to client.
Q.16 Enlist the features of extensible messaging & presence protocol for cloud
computing.
Ans. : The features of extensible messaging & presence protocol for cloud computing
are :
a. It is decentralized and supports easy two-way communication.
b. It doesn’t require polling for synchronization.
c. It has built-in publish subscribe (pub-sub) functionality.
d. It works on XML based open standards.
e. It is perfect for instant messaging features and custom cloud services.
f. It is efficient and scales up to millions of concurrent users on a single service.
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL).
i. It is flexible and extensible.
®
Q.1 Give a detailed note on Hadoop framework. AU : Dec.-16

Ans. : Refer section 5.1 and 5.1.1.
Q.2 Explain the Hadoop Ecosystem framework.

Q.3 Explain the Hadoop Distributed File System architecture with a diagram.
AU : Dec.-18
Q.4 Elaborate HDFS concepts with suitable diagram. AU : May-17
OR Illustrate the design of Hadoop file system. AU : Dec.-19
Q.5 Illustrate dataflow in HDFS during file read/write operation with suitable
diagrams. AU : Dec.-17
Ans. : The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves. The HDFS
is implemented as block structure file system where files are broken in to block of fixed
size stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.4.
Fig. 5.4 : HDFS Architecture
The Components of HDFS composed of following elements

1. Name Node
An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system
®
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system,
while the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block
creation, deletion and replication upon instruction from name node. The data node
stores each HDFS data block in separate file and several blocks are stored on different
data nodes. The requirement of such a block structured file system is to store, manage
and access files metadata reliably.
The representation of name node and data node is shown in Fig. 5.5.
Fig. 5.5 : Representation of name node and data nodes
3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user
references files and directories by paths in the namespace. The user application does
not need to aware that file system metadata and storage are on different servers, or that
blocks have multiple replicas. When an application reads a file, the HDFS client first
asks the name node for the list of data nodes that host replicas of the blocks of the file.
The client contacts a data node directly and requests the transfer of the desired block.
When a client writes, it first asks the name node to choose data nodes to host replicas of
the first block of the file. The client organizes a pipeline from node-to-node and sends
the data. When the first block is filled, the client requests new data nodes to be chosen
to host replicas of the next block. The Choice of data nodes for each block is likely to be
different.
®
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system
are divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
A. Read Operation in HDFS
The Read Operation in HDFS is shown in Fig. 5.6 and explained as follows.
Fig. 5.6 : Read Operation in HDFS

1. A client initiates read request by calling 'open ()' method of Filesystem object; it is
an object of type Distributed File system.
2. This object connects to name node using RPC and gets metadata information such
as the locations of the blocks of the file. Please note that these addresses are of first
few blocks of a file.
3. In response to this metadata request, addresses of the Data Nodes having a copy
of that block is returned back.
4. Once addresses of Data Nodes are received, an object of type FSDataInputStream
is returned to the client. FSDataInputStream contains DFSInputStream which
takes care of interactions with Data Node and Name Node. In step 4 shown in the
above diagram, a client invokes 'read ()' method which causes DFSInputStream to
establish a connection with the first Data Node with the first block of a file.
5. Data is read in the form of streams wherein client invokes 'read ()' method
repeatedly. This process of read () operation continues till it reaches the end of
block.
®
6. Once the end of a block is reached, DFSInputStream closes the connection and
moves on to locate the next Data Node for the next block
7. Once a client has done with the reading, it calls a close () method.
B. Write Operation in HDFS
The Write Operation in HDFS is shown in Fig. 5.7 and explained as follows
Fig. 5.7 : Write operation in HDFS
1. A client initiates write operation by calling 'create ()' method of Distributed File
system object which creates a new file - Step no. 1 in the above diagram.
2. Distributed file system object connects to the Name Node using RPC call and
initiates new file creation. However, this file creates operation does not associate
any blocks with the file. It is the responsibility of Name Node to verify that the file
(which is being created) does not exist already and a client has correct permissions
to create a new file. If a file already exists or client does not have sufficient
permission to create a new file, then IOException is thrown to the client.
Otherwise, the operation succeeds and a new record for the file is created by the
Name Node.
3. Once a new record in Name Node is created, an object of type
FSDataOutputStream is returned to the client. A client uses it to write data into
the HDFS. Data write method is invoked (step 3 in the diagram).
®
4. FSDataOutputStream contains DFSOutputStream object which looks after

communication with Data Nodes and Name Node. While the client continues
writing data, DFSOutputStream continues creating packets with this data. These
packets are enqueued into a queue which is called as DataQueue.
5. There is one more component called DataStreamer which consumes
this DataQueue. DataStreamer also asks Name Node for allocation of new blocks
thereby picking desirable Data Nodes to be used for replication.
6. Now, the process of replication starts by creating a pipeline using Data Nodes. In
our case, we have chosen a replication level of 3 and hence there are 3 Data Nodes
in the pipeline.
7. The DataStreamer pours packets into the first Data Node in the pipeline.
8. Every Data Node in a pipeline stores packet received by it and forwards the same
to the second Data Node in a pipeline.
9. Another queue, 'Ack Queue' is maintained by DFSOutputStream to store packets
which are waiting for acknowledgment from Data Nodes.
10. Once acknowledgment for a packet in the queue is received from all Data Nodes
in the pipeline, it is removed from the 'Ack Queue'. In the event of any Data Node
failure, packets from this queue are used to reinitiate the operation.
11. After a client is done with the writing data, it calls a close () method (Step 9 in the
diagram) Call to close (), results into flushing remaining data packets to the
pipeline followed by waiting for acknowledgment.
12. Once a final acknowledgment is received, Name Node is contacted to tell it that
the file write operation is complete.
Q.6 Discuss MapReduce with suitable diagram. AU : May-17
OR Analyze how MapReduce framework supports parallel and distributed

computing on large data sets with a suitable example. AU : Dec.-19
OR Illustrate the Hadoop implementation of MapReduce framework.
AU : May-19
Q.7 Develop a wordcount application with Hadoop MapReduce programming

model. AU : May-19
®
Ans. : The MapReduce takes a set of input <key, value> pairs and produces a set of
output <key, value> pairs by supplying data through map and reduce functions. Every
MapReduce program undergoes different phases of execution. Each phase has its own
significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.8 and explained as follows.
Fig. 5.8 : Different phases of execution in MapReduce
Let us take an example of Word count application where inputs is an set of words.
The Input to the mapper has three sets of words like [Deer, Bear, River], [Car, Car,
River] and [Deer, Car, Bear]. These three sets are taken arbitrarily as an input to the
MapReduce process. The various stages in MapReduce for wordcount application are
shown in Fig. 5.9.
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.
®
Fig. 5.9 : Various stages in MapReduce for Wordcount application
Once input file is selected then the splitting phase reads the input data and divided
those in to smaller chunks. Like [Deer, Bear, River], [Car, Car, River] and [Deer, Car,
Bear] as a separate set. The splitted chunks are then given to the mapper.
The mapper does map operations extract the relevant data and generate
intermediate key value pairs. It reads input data from split using record reader and
generates intermediate results like [Deer:1; Bear:1; River:1], [Car:1; Car:1; River:1] and
[Deer:1; Car:1; Bear:1]. It is used to transform the input key, value list data to output
key, value list which is then pass to combiner.
The shuffle and sort are the components of reducer. The shuffling is a process of
partitioning and moving a mapped output to the reducer where intermediate keys are
assigned to the reducer. Each partition is called subset. Each subset becomes input to
the reducer.in general shuffle phase ensures that the partitioned splits reached at
appropriate reducers where reducer uses http protocol to retrieve their own partition
from mapper. The output of this stage would be [Deer:1, Deer:1], [Bear:1, Bear:1],
[River:1, River:1] and [Car:1, Car:1, Car:1].
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases
occur simultaneously where mapped output are being fetched and merged. It sorts all
intermediate results alphabetically like [Bear:1, Bear:1], [Car:1, Car:1, Car:1], [Deer:1,
Deer:1] and [River:1, River:1]. The combiner is used with both mapper and reducer to
reduce the volume of data transfer.it is also known as semi reducer which accepts input
from mapper and passes output key, value pair to reducer. Then output of this stage
would be [Bear:2], [Car:3], [Deer:2] and [River:2].
®
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format
like [Bear:2, Car:3, Deer:2, River:2]. The final output of each MapReduce program is
generated with key value pairs written in output file which is written back to the HDFS
store. In example of Word count process using MapReduce with all phases of execution
are illustrated in Fig. 5.9.
Q.8 Explain the functional architecture of the Google cloud platform for app
engine in detail.
Q.9 Write a short note on Google file system.

Q.10 Explain the functionality of Chubby.

Q.11 Explain the significance of Big table along with its working.
Q.12 Explain in brief the conceptual architecture of Openstack.

Q.13 Write a short note on levels of federation in cloud.


®
Cloud Computing Lab
Contents
Lab 1 : Install Virtual Box and KVM with different flavors of Linux or windows on the top of host
OS......................................................................... ....................................................L - 2
Lab 2 : Install C compiler in Virtual machine created using VirtualBox and execute simple
programs............................................................... ..................................................L - 10
Lab 3 : Install GoogleApp Engine, create helloworld app, and other simple web applications
using Python.. ....................................................... ..................................................L - 12
Lab 4 : Use GAE launcher to launch web application....... . .................................................L - 14
Lab 5 : To Simulate Cloud scenario using CloudSim and run a scheduling algorithm into it.
.............................................................................. ..................................................L - 16
Lab 6 : Find a Procedure to transfer files from one VM to another VM in VirtualBox.. ........L - 21
Lab 7 : To demonstrate installation and Configuration of Open stack Private cloud... ........L - 22
Lab 8 : Install Hadoop Single node cluster and simple application like Wordcount. ............L - 40
Lab 9 : Explore Storage as a service using own Cloud for remote file access using web
interfaces... ........................................................... ...................................................L- 64
Lab 10 :To Create and access Windows Virtual machine using AWS EC2... ......................L - 69
Lab 11 :To host a word press website using Light sail service in AWS ................................L - 79
Lab 12 :To demonstrate Storage as a service using Amazon S3.........................................L - 85
(L - 1)
Cloud Computing L-2 Cloud Computing - Lab
Lab 1 : Install Virtual Box and KVM with different flavors of Linux or windows on the
top of host OS
I. Hosted Virtualization on Oracle Virtual Box Hypervisor
Step 1 : Download Oracle Virtual box from

https://www.virtualbox.org/wiki/Downloads
Step 2 : Install it in Windows, once the installation has done open it.
®
Step 3 : Create Virtual Machine by clicking on New
Step 4 : Specify RAM Size, HDD Size, and Network Configuration and Finish the
wizard
®
Step 5 : To select the media for installation Click on start and browse for iso file
In this example we are selecting Ubuntu Linux iso.
Step 6 : Complete the Installation and use it.

Now complete the installation using standardizes instructions to use it.
®
Ubuntu Installation Screen
Ubuntu Running inside VirtualBox
Step 7 : To Connect OS to the network change network Mode to Bridge Adaptor
®
Similarly, you can install Windows as shown below.
Step 1 : Click on new virtual machine and select Operating system Type as Windows
and version as Windows 10 along with the name Windows Insider Preview.
®
Step 2 : Perform step 2 and 3 same as installation of Ubuntu. In step 4 select ISO file of
Windows 10 instead of Ubuntu and complete the installation. Once installation is done
windows will be available as shown below.
II. Hosted Virtualization on KVM Hypervisor
The Steps to Create and run Virtual machines in KVM are as follows
1) Check whether CPU has hardware virtualization support
KVM only works if your CPU has hardware virtualization support - either Intel VT-x or
AMD-V. To determine whether your CPU includes these features, run the following
command :
#sudo grep -c "svm\|vmx" /proc/cpuinfo
A 0 indicates that your CPU doesn’t support hardware virtualization, while a 1 or more
indicates that it does.
2) Install KVM and supporting packages
Virt-Manager is a graphical application for managing your virtual machines.you can use
the kvm command directly, but libvirt and Virt-Manager simplify the process.
®
#sudo apt-get install qemu-kvm libvirt-bin bridge-utils virt-manager
3) Create User
Only the root user and users in the libvirtd group have permission to use KVM virtual
machines. Run the following command to add your user account to the libvirtd group :
#sudo adduser tsec
#sudo adduser tsec libvirtd
After running this command, log out and log back in as tsec
4) Check whether everything is working correctly
Run following command after logging back in as tsec and you should see an empty list of
virtual machines.
This indicates that everything is working correctly.
#virsh -c qemu:///system list
5) Open Virtual Machine Manager application and Create Virtual Machine

#virt-manager
6) Create and run Virtual Machines
®
®
Cloud Computing L - 10 Cloud Computing - Lab
Lab 2 : Install C compiler in Virtual machine created using VirtualBox and execute
simple programs
In Lab 1, we have already created an Ubuntu Linux virtual machine. Now let us see how
to install ‘C’ compiler inside that virtual machine and execute programs. The package
GCC needs to be installing to use C compiler.
The GNU Compiler Collection (GCC) is a collection of compilers and libraries for C, C++,
Objective-C, FORTRAN, Ada, Go, and D programming languages. Many open-source
projects, including the GNU tools and the Linux kernel, are compiled with GCC. The
steps for installing GCC and running C programs are as follows.
Step 1 : Installing GCC on Ubuntu

The default Ubuntu repositories contain a meta-package named build-essential that
contains the GCC compiler and a lot of libraries and other utilities required for compiling
software. First start by updating the packages list.
$ sudo apt update
Now install the build-essential package by using following command.
$ sudo apt install build-essential
Step 2 : Check the GCC version for ‘C” Compiler

To validate that the GCC compiler is successfully installed, use the gcc --version
command which prints the GCC version :
$ gcc –version
®
The output will show the appropriate GCC version like gcc
(Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0
Step 3 : Write C program using Gedit editor

To use Gedit, Open Terminal (Applications Accessories Terminal). Open gedit by typing
“gedit on terminal or open Gedit application from application menu.
Step 4 : Write a simple Hello world program using C.

Inside the gedit editor, write a simple calculator program as shown below.
#include<stdio.h>
main()
{
printf("HelloWorld\n");
return 0;
}
Save this file as “hello.c”, compile it on terminal using command,

$gcc -o hello hello.c
Here -o is used to create a separate output file for our program otherwise it can be run
through a.out file. As we have created a separate output file, run the program using
command
$./hello
The output of the program is shown below.
®
Lab 3 : Install GoogleApp Engine, create helloworld app, and other simple web
applications using Python.
A) First install the Cloud SDK and then set up a Cloud project for App Engine :
1. Download and install Cloud SDK :
Note : If you already have the Cloud SDK installed, update it by running the following
command :
gcloud components update
2. Create a new project :

gcloud projects create [YOUR_PROJECT_ID] --set-as-default
3. Verify the project was created :

gcloud projects describe [YOUR_PROJECT_ID]
You can see project details as like the following :
createTime : year-month-hour
lifecycleState : ACTIVE
name: project-name
parent:
id : '433637338589'
type : organization
projectId : project-name-id
projectNumber : 499227785679
®
4. Initialize your App Engine app with your project and choose its region :
gcloud app create --project=[YOUR_PROJECT_ID]
When prompted, select the region where you want your App Engine application located.
5. Make sure billing is enabled for your project. A billing account needs to be linked to
your project in order for the application to be deployed to App Engine.
6. Install the following prerequisites :
 Download and install Git.

 Run the following command to install the gcloud component that includes the App
Engine extension for Python 3.7 :
gcloud components install app-engine-python
7. Prepare your environment for Python development.
It is recommended that you have the latest version of Python, pip, and other related tools
installed on your system. For instructions, refer to the Python Development Environment
Setup Guide. This quick start demonstrates a simple Python app written with the Flask
web framework that can be deployed to App Engine. Although this sample uses Flask,
you can use any web framework that satisfies the requirements above. Alternative
frameworks include Django, Pyramid, Bottle, and web.py.
B) Download the Hello World app and run it locally
In last section, we’ve created a simple Hello World app for Python 3.7. Now next section
explains how to deploying an app to the Google Cloud.
1. Clone the Hello World sample app repository to your local machine.
git clone https://github.com/GoogleCloudPlatform/python-docs-samples
Alternatively, you can download the sample as a zip file and extract it.
2. Change to the directory that contains the sample code.

cd python-docs-samples/appengine/standard_python37/hello_world
3. Run Hello World on your local machine
To run the Hello World app on your local computer : Mac OS / LinuxWindows, Use
PowerShell to run your Python packages.
a) Locate your installation of PowerShell.
b) Right-click on the shortcut to PowerShell and start it as an administrator.
c) Create an isolated Python environment in a directory external to your project and
activate it :
®
python -m venv env

env\Scripts\activate
d) Navigate to your project directory and install dependencies :
cd YOUR_PROJECT
pip install -r requirements.txt
e) Run the application :
python main.py
f) In your web browser, enter the following address :
http://localhost:8080
The Hello World message from the sample app displays on the page. In your terminal
window, press Ctrl+C to exit the web server.
C) Deploy and run Hello World on App Engine
To deploy your app to the App Engine standard environment :
1. Deploy the Hello World app by running the following command from the
standard_python37/hello_world directory:
gcloud app deploy
Learn about the optional flags.
2. Launch your browser to view the app at

https://PROJECT_ID.REGION_ID.r.appspot.com
gcloud app browse
where PROJECT_ID represents your Google Cloud project ID.
This time, the page that displays the Hello World message is delivered by a web server
running on an App Engine instance.
Source : https://cloud.google.com/appengine/docs/standard/python3/quickstart
Lab 4 : Use GAE launcher to launch web application
The steps to launch web application in Google app engine are as follows
Step 1 - Creating a Google Cloud Platform project
To use Google's tools for your own site or app, you need to create a new project on
Google Cloud Platform. This requires having a Google account.
1. Go to the App Engine dashboard on the Google Cloud Platform Console and press
the Create button.
®
2. If you've not created a project before, you'll need to select whether you want to
receive email updates or not, agree to the Terms of Service, and then you should be
able to continue.
3. Enter a name for the project, edit your project ID and note it down. For this tutorial,
the following values are used :
 Project Name : GAE Sample Site
 Project ID : gaesamplesite
4. Click the Create button to create your project.
Step 2 - Creating an application
Each Cloud Platform project can contain one App Engine application. Let's prepare an
app for our project.
1. We'll need a sample application to publish. If you've not got one to use, download
and unzip this sample app.
2. Have a look at the sample application's structure - the website folder contains your
website content and app.yaml is your application configuration file.
3. Your website content must go inside the website folder, and its landing page must
be called index.html, but apart from that it can take whatever form you like.
4. The app.yaml file is a configuration file that tells App Engine how to map URLs to
your static files. You don't need to edit it.
Step 3 - Publishing your application
Now that we've got our project made and sample app files collected together, let's publish
our app.
1. Open Google Cloud Shell.
2. Drag and drop the sample-app folder into the left pane of the code editor.
3. Run the following in the command line to select your project:
gcloud config set project gaesamplesite
4. Then run the following command to go to your app's directory:

cd sample-app
5. You are now ready to deploy your application, i.e. upload your app to App Engine:
gcloud app deploy
6. Enter a number to choose the region where you want your application located.
®
7. Enter Y to confirm.
8. Now navigate your browser to your-project-id.appspot.com to see your website

online. For example, for the project ID gaesamplesite, go to
gaesamplesite.appspot.com.
Source : https://developer.mozilla.org/en-
US/docs/Learn/Common_questions/How_do_you_host _ your_
website_on_Google_App_Engine
Lab 5 : To Simulate Cloud scenario using CloudSim and run a scheduling algorithm
into it.
A) Installation of CloudSim in Eclipse
1. Open up Eclipse and go to Menu Section, then click File, keep on clicking New and
finally select java project. It is shown as in the Fig. 1.
Fig. 1
Open eclipse and select java project.
Open up Eclipse and Click on java project.
2. A new window will get open. Put a foot on to the following steps :
1. Enter project name. (I have named it as CloudIntro).

2. In the next line you will see the path where your project will be created as it as
shown in the Figure 2.
3. Next you need to select the JRE environment.
4. Finally click Finish.
®
Fig.2 Give project Name and select run time environment and Finish
3. Once you hit finish. An empty project named CloudIntro will be created in the project
List as shown in the Fig. 3.
Fig. 3 Project Folder Location
4. Next step is to go the project CloudIntro, right click on it. Click Import as shown in the
Fig. 4.
®
Fig. 4 Import cloud sim tool files and subsequent folders
5. A new window will get open, now click File System as demonstrated in the Fig. 5.
Fig. 5 Next to select is File System
6. Next Step is to go to the directory where you have extracted your cloud sim tool. Fig. 6
is shown to guide you to get into the directory where your cloudsim folder is located.
®
Fig. 6 Go to Directory to select Cloudsim (My system searching)
7. Select the cloudsim and click Finish as shown in the Fig. 7.
Fig. 7 Select Cloudsim and Hit finish
8. Now go to the link
http://commons.apache.org/proper/commonsmath/download _math.cgi.
Download the file named as “commons-math3-3.4.1-bin.zip”. Unzip this file. We need jar
files for math functions.
9. Now go to the left side of the eclipse tool in the project bar. Go to jar and right click on
it. Click import as shown in the Fig. 8.
®
Fig. 8 Import jar files for math calculations
10. Now go to the folder where you have placed the downloaded and extracted file as
described by point 8. Then all you have to do is select that jar file and hit finish as
shown by the Fig. 9.
Fig. 9 Import only jar
11. Finally the cloud sim is installed into your Eclipse environment.
Now to write the following program for VM Scheduling and run inside CloudSim
Programs are available on
a) http://www.cloudbus.org/cloudsim/examples.html
b) https://www.cloudsimtutorials.online/how-to-do-virtual-machine-and-task-
scheduling-in-cloudsim/
®
Lab 6 : Find a Procedure to transfer files from one VM to another VM in VirtualBox
A shared folder is a folder which makes its files available on both the guest machine and
the host machine at the same time. Creating a shared folder between the guest and the
host allows you to easily manage files which should be present on both machines. The
course virtual machines are ready to use shared folders right away, but if you are using
the virtual machine on your personal computer you will need to specify which folder to
use as shared storage.
6.1 Shared Folders on SCS Lab Computers using Course VMs
If you are using a course VM on a lab computer, it is likely that a shared folder has
already been setup for you. On the desktop of your course VM you should notice a folder
titled Shared Folders. Inside of this you will find any folders that have been shared
between the course VM and lab computers. You should see two folders that have already
been configured for you: Z_DRIVE and Temp. Z_DRIVE gives you access to your
Windows Account Z:\ drive. This is storage that is persistent to your SCS account and
available as a network drive on the lab computers.
Temp gives you access to the folder found at D:\temp on the lab computer. Files stored in
this folder are local to the machine, meaning that they can be accessed faster, but will
delete from the system when you log out.
If you are working with data that you will need to use again, use the Z_DRIVE for your
shared folder. If you need faster read/write speed, use the Temp folder, but remember to
back up your files or they will be deleted when you log off the computer.
6.2 Shared Folders on Personal Computers
If you are using your own personal machine, you will need to configure VirtualBox to
look in the right place for your shared files.
First, click on the guest machine you intend to share files with. From there, you can select
the guest Settings and navigate to Shared Folders on the left side menu. To create a new
shared folder, either clicks the New Folder icon on the right menu or right clicks the
empty list of shared folders and click Add Shared Folder. From here, there are six options.
 Folder Path : The folder name on the host machine. Click the drop down menu and
navigate to the folder you would like to share.
 Folder Name : This is the name of the folder as it will appear on the guest machine.
 Read-Only : If you check read-only, the guest machine will be unable to write
changes to the folder. This is valuable when you only want to send files to the
®
virtual machine, but do not want to risk having the files modified by the guest.
 Auto-Mount : When any external storage is connected to a computer it must be
mounted in order to be used. It is recommended that you turn on auto-mounting,
unless you are familiar with the process of mounting a drive yourself.
 Mount Point : Unless you already know about mount points, leave this blank.
 Make Permanent : If you check this, the shared folder will be a permanent machine
folder. If it is not checked, the folder will not be shared after a shutdown.
 On the course virtual machines, when you load into the desktop, you should see a
folder labelled SharedFolders. In there you will see any folders that are currently
mounted and being shared.
Dragging and Dropping Files in VirtualBox
If you only need to transfer a few files quickly, you can simply drag and drop the files in.
On the top bar of the running guest machine, click on Devices > Drag and Drop and make
sure that Bidirectional is selected. This means that you will be able to drag files from the
host to the guest and from the guest to the host. Once bidirectional drag and drop is
checked, you should be able to begin dragging and dropping files.
You can also drag files from the guest machine into the host. To do this, simply open the
file browser on the host to where you would like to drop the files and drag the files from
the virtual machine into the file browser of the host. File transfers should be pretty quick;
if the virtual machine seems stuck when transferring, simply cancel the transfer and try
again.
Source :
https://carleton.ca/scs/tech-support/virtual-machines/transferring-files-to-and-from-
virtual-machines/
Lab 7 : To demonstrate installation and Configuration of Open stack Private cloud.
The OpenStack installation can be done using many ways like RDO Pack stack, Mirantis
or Devstack who have series of shell scripts which carries automated installation of
OpenStack. The DevStack is a series of extensible scripts used to quickly bring up a
complete OpenStack environment based on the latest versions of everything from git
master.
To install OpenStack using Devstack the Prerequisites are Intel or AMD Multicore CPU,
Minimum 6-8GB RAM, 250 GB Hard disk and preinstalled Ubuntu server/Desktop
®
Operating system version 16.04 or above and internet speed should be minimum 4 MBPS.
(The installation steps can be found at https://docs.openstack.org/devstack/latest/ )
The steps for installing Openstack using Devstack in a single server (All in one Single
machine setup) are given as follows.
Step 1 : Update the ubuntu repository and install git package
The current version of Ubuntu OpenStack is Newton. So, that’s what we are going to
install. To begin with the installation, first, we need to use the git command to clone
devstack.
$sudo apt-get update
$sudo apt-get install git
Step 2 : Download the latest git repository for openstack

$ git clone https://git.openstack.org/openstack-dev/devstack
®
Step 3 : Open Devstack directory and start installation by executing stack.sh shell script
$cd Devstack
$./stack.sh
At the initial stage, the installer will ask passwords for database, rabbit, service
authentication, horizon and keystone.
®
®
The installer may take up to 30 minutes to complete the installation depends on the
internet bandwidth. Once installation is done you may see the following screen which
displays ip address of dashboard i.e. horizon through which you can gain access to open
stack VMs and resources.
As you can see, two users have been created for you; admin and demo. Your password is
the password you set earlier. These are the usernames you will use to login to the
OpenStack Horizon Dashboard.
Open up a browser, and put the Horizon Dashboard address in your address bar.
http://192.168.0.116/ dashboard you should see a login page like this.
®
To start with, log in with the admin users credentials. In admin panel, you will need to
use the demo user, or create a new user, to create and deploy instances. As you can see,
two users have been created for you; admin and demo. Your password is the password
you set earlier. These are the usernames you will use to login to the OpenStack Horizon
Dashboard. Take note of the Horizon web address listed in your terminal.
Creating and running Instances
To launch an instance from OpenStack dashboard, first we need to finish following steps :
 Create a Project and add a member to the Project
 Create Image and Flavor
 Create Network for the Project
 Create Router for the Project
 Create a Key pair
A) Create a Project and add a member to the Project.
Login to the dashboard using Admin credentials and Go to Identity Tab –> Projects and
Click on Create Project.
®
Click on “Create Project” , We can also set the Quota for the project from Quota Tab. To
create Users, Go to Identify Tab–> Users–> Click on ‘Create User’ Button then specify
User Name, email, password, Primary Project and Role and click on create user to add in
to OpenStack workspace.
B) Create Image and Flavor
To create a flavor login in dashboard using admin credentials, Go to Admin Tab –>
Flavors –> Click on create Flavor.
®
Specify the Flavor Name (fedora.small), VCPU , Root Disk , Ephemeral Disk & Swap disk.
To Create Image, Go to Admin Tab –> Images—> Click on Create Image.

Specify the Image Name, Description, Image Soure ( in my case i am using Fedora Image
File which i have already downloaded from fedora website with Format QCOW2).
®
C) Create Network for the Project.
To create Network and router for Innovation project sign out of admin user and login as
local user in dashboard.
For my convenience i have setup my network as above
Internal Network = 10.10.10.0/24
External Network or Floating IP Network = 192.168.1.0/24
Gateway of External Network = 192.168.1.1
Now, Go to the Network Tab —> Click on Networks —> then Click on Create Network
Specify the Network Name as Internal
®
Click on Next. Then Specify the Subnet name (sub-internal) and Network Address
(10.10.0.0/24)
Click on Next. Now, VMs will be getting internal IP from DHCP Server because we
enable DHCP option for internal network.
Now Create External Network. Click on “Create Network” again, Specify Network
Name as “external”
®
Click on Next. Specify subnet Name as “sub-external” & Network Address as

“192.168.1.0/24”
Click on Next
Untick “Enable DHCP” option and Specify the IP address pool for external network.
®
Click on Create.
D) Create Router for the Project
Now time to create a Router. To create router Go To Network Tab –> Routers –> Click
on ‘+ Create Router’
Now Mark External network as “External” , this task can be completed only from admin
user , so logout from linuxtechi user and login as admin.
Go to Admin Tab —> Networks–> Click on Edit Network for “External”
Click on Save Changes. Now Logout from admin user and login as local user. Go to
Network Tab —> Routers –> for Router1 click on “Set Gateway”
®
Click on “Set Gateway”, this will add an interface on router and will assign the first IP of
external subnet (192.168.1.0/24).
Add internal interface to router as well, Click on the “router1″ and select on “interfaces”
and then click on “Add interface”.
Now, Network Part is completed now & we can view Network Topology from “Network
Topology” Tab as below.
®
Now Create a key pair that will be used for accessing the VM and define the Security
firewall rules.
E) Create a key pair
Go to ‘Access & Security’ Tab -> Click on Key Pairs -> then click on ‘Create Key Pair‘
It will create a Key pair with name “myssh-keys.pem” Add a new Security Group with
name ‘fedora-rules’ from Access & Security Tab. Allow 22 and ICMP from Internet
( 0.0.0.0 ).
®
Once the Security Group ‘fedora-rules’ created , click on Manage Rules and allow 22 &
ICMP ping.
Click on Add , Similarly add a rule for ICMP.
F) Launch Instance
Now finally it’s time to launch an instance. To launch instance, Go to Compute Tab –>
Click on Instances –> then click on ‘Launch Instance’ Then Specify the Instance Name,
Flavor that we created in above steps and ‘Boot from image’ from Instance Boot Source
option and Select Image Name ‘fedora-image’.
®
Click on ‘Access & Security’ and Select the Security Group ‘fedora-rules’ & Key Pair
”myssh-keys”
Now Select Networking and add ‘Internal’ Network and the Click on Launch.
®
Once the VM is launched, Associate a floating ip so that we can access the VM.
Click on ‘Associate Floating IP‘ to get public IP addresses.
®
Click on Allocate IP.
Click on Associate
Now try to access the VM with floating IP ( 192.168.1.20) using keys.
®
As we can see above that we are able to access the VM using keys. Our task of launching a
VM from Dashboard is completed now.
Lab 8 - Install Hadoop Single node cluster and simple application like Wordcount.
8.1 Installation of Hadoop single node cluster on Ubuntu 16.04

In single node setup name node and data node runs on same machine. The detail steps to
install Hadoop on ubuntu 16.04 are explained as follows.
Step 1 - Update the Ubuntu
$ sudo apt-get update
Step 2 - Install JDK

$ sudo apt-get install default-jdk
®
Verify the Java Version
Step 3 - Add dedicated hadoop users

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
Step 4 - Install SSH

$ sudo apt-get install ssh
®
Verify SSH using which command
Step 5 - Create and setup SSH certificates
Hadoop requires SSH access to manage its nodes, i.e. remote machines plus our local
machine. For our single-node setup of Hadoop, we therefore need to configure SSH
access to localhost. So, we need to have SSH up and running on our machine and
configured it to allow SSH public key authentication.
$ ssh-keygen -t rsa -P ""
®
Add the newly created key to the list of authorized keys so that Hadoop can use ssh
without prompting for a password.
$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Check the SSH to localhost

$ ssh localhost
®
Disable ipv6 feature as it uses 0.0.0.0 for the various networking-related Hadoop
configuration options will result in Hadoop binding to the IPv6 addresses. To disable it
open sysctl.conf file
$ sudo nano /etc/sysctl.confa
add following lines at the end of sysctl.conf file and reboot the machine.
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
You can check whether IPv6 is enabled on your machine with the following command :
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
A return value of 0 means IPv6 is enabled and a value of 1 means disabled.
®
Step 6 - Download hadoop

$ wget
http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
Extract hadoop zip file and move hadoop to /usr/local directory

$ tar xvzf hadoop-2.6.0.tar.gz
Step 7 - Assign root privileged to hduser

$ sudo adduser hduser sudo
$ sudo chown -R hduser:hadoop /usr/local/hadoop
®
Step 8 - Setup Configuration Files
The following files will have to be modified to complete the Hadoop setup :
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
1. Configure bashrc file
we need to find the path where Java has been installed to set the JAVA_HOME
environment variable in bashrc file. So open bashrc file.
®
Append the following lines at the end of bashrc file.
#HADOOP VARIABLES START

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END
Compile the file and check java version.
$ source ~/.bashrc
$ javac -version
$ which javac
$ readlink -f /usr/bin/javac
®
2. Configure hadoop-env.sh to set JAVA_HOME
Export the path of Java Home

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
®
3. Configure core-site.xml file
The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties

that Hadoop uses when starting up. This file can be used to override the default settings
that Hadoop starts with.So create temp directory inside hadoop and assign it to hduser.
Then open the core-site.xml file
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
$ sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
®
Add the following lines inside <configuration> section

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
4. Configure mapred-site.xml file
By default, the /usr/local/hadoop/etc/hadoop/ folder contains

/usr/local/hadoop/etc/hadoop /mapred-site.xml.template file which has to be
renamed/copied with the name mapred-site.xml. So copy the file and open it for
configuration.
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-
site.xml
$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
®
Add following lines inside <Configuration> section
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
®
5. Configure hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each

host in the cluster that is being used. It is used to specify the directories which will be
used as the namenode and the datanode on that host.So first create directories under hdfs
for name node,data node and hdfs store.
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop_store
Open hdfs-site.xml for configuration
$ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
®
Add the following lines under <configuration> section

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
</configuration>
Step 9 - Format the New Hadoop Filesystem
The Hadoop file system needs to be formatted so that we can start to use it.
$ hadoop namenode –format
®
Step 10 - Start all services of hadoop to use it.
There are two commands to start all the services of hadoop.They are given as follows.
$start-all-sh
$./start-all.sh
®
To verify all the services are running pass JPS command. If output of JPS command
shows following output then we can say that hadoop is successfully installed.
8.2 Write a word count program to demonstrate the use of Map and Reduce tasks.
In this practical, single node hadoop cluster have been used. The hadoop cluster with pre-
installed eclipse on Cent OS is going to used for running Map-reduce program. The steps
to run word count program using map-reduce framework are as follows.
®
Step 1 - Open Eclipse and create new Java project specify name and click on finish.
®
®
Step 2 - Right click on project and Create new package wordcount
Step 3 - Right click on Package name wordcount and create new class in it and assign
name wordcount
Step 4 - Write mapreduce program for wordcount with in that class
package wordcount;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
®
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class wordcount

{
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>
{
public void map(LongWritable key, Text value, Context con) throws IOException,
InterruptedException
{
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while(token.hasMoreTokens())
{
String status = new String();
String word = token.nextToken();
Text outputKey = new Text(word);
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}
} // end of map()
} //end of Mapper Class
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>

{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException,
InterruptedException
{
int sum = 0;
for(IntWritable value : values)

{
sum += value.get();
}
con.write(word, new IntWritable(sum));

®
} // end of reduce()
} // end of Reducer class
/*
*/
// job definition
public static void main(String[] args) throws Exception

{
Configuration c = new Configuration();

String[] files = new GenericOptionsParser(c, args).getRemainingArgs();
Path input = new Path(files[0]);
Path output = new Path(files[1]);
Job j = new Job(c, "wordcount");
j.setJarByClass(wordcount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true) ? 0:1);
} // end of main()
} //end of main class
Step 5 - Add required jar files to resolve errors
To add jar files right click on class file then select build path option then open configure
build path window. To add essential libraries click on add external jars butoon and add
three jar files one by one .Here we need three jar files namely hadoop-core.jar,common-
cli-1.2.jar and core-3.1.1.jar
®
®
Step 6 - Once all the errors have been resolved then right click on project and select
export jar files,specify name to it and click on finish.
®
Step 7- Create input text file and copy both input and jar files it to hadoop directory
Step 8 - Run the program using following command

$ hadoop jar jar-name.jar package.class input-file(s) output-directory
In our program jar file name is word.jar, package name is wordcount, class name is
wordcount and input file name is inputtsec. So command will be
$hadoop jar word.jar wordcount.wordcount hadoop/inputtsec.txt hadoop/output002/
Step 9 - Check the output
To see the output open part file which lies inside output002 directory.
®
Lab 9 : Explore Storage as a service using own Cloud for remote file access using
web interfaces.
ownCloud is a suite of client-server software for creating and using file hosting services.
ownCloud is functionally very similar to the widely used Dropbox, with the primary
functional difference being that the Server Edition of ownCloud is free and open-source,
and thereby allowing anyone to install and operate it without charge on a private server.
It also supports extensions that allow it to work like Google Drive, with online document
editing, calendar and contact synchronization, and more. Its openness avoids enforced
quotas on storage space or the number of connected clients, instead having hard limits
(like on storage space or number of users) defined only by the physical capabilities of the
server.
Installation and configuration of OwnCLoud
Own cloud can be installed over the any flavor of linux like Ubuntu, Centos, Fedora etc.
but Ubuntu is preferable. The Steps for installation are as follows
Step 1 - Installing ownCloud
The ownCloud server package does not exist within the default repositories for Ubuntu.
However, ownCloud maintains a dedicated repository for the distribution that we can
add to our server.
To begin, download their release key using the curl command and import it with the apt-
key utility with the add command :
$curl https://download.owncloud.org/download/repositories/10.0/Ubuntu_18.04/Release.key | sudo
apt-key add –
®
The 'Release.key' file contains a PGP (Pretty Good Privacy) public key which apt will use
to verify that the ownCloud package is authentic.
Now execute following commands on the terminal
1) $ echo 'deb http://download.owncloud.org/download/repositories/10.0/Ubuntu_18.04/ /' | sudo
tee /etc/apt/sources.list.d/owncloud.list
2) $sudo apt update
3) $ sudo apt install php-bz2 php-curl php-gd php-imagick php-intl php-mbstring php-xml php-zip
owncloud-files
Step 2 - Set the Document Root
The ownCloud package we installed copies the web files to /var/www/owncloud on the
server. Currently, the Apache virtual host configuration is set up to serve files out of a
different directory. We need to change the DocumentRoot setting in our configuration to
point to the new directory.
$sudo apache2ctl -t -D DUMP_VHOSTS | grep server_domain_or_IP
Now edit the Configuration file and add following lines so that it points to the
/var/www/owncloud directory:
$sudo nano /etc/apache2/sites-enabled/server_ domain_or_IP.conf <VirtualHost *:80>
...
DocumentRoot /var/www/owncloud
...
</VirtualHost>
When you are finished, check the syntax of your Apache files to make sure there were no
detectable types in your configuration
$sudo apache2ctl configtest
Output - Syntax OK
Step 3 - Configuring the MySQL Database
Open mysql prompt, create database and execute following commands.

1) $mysql -u root –p
2) mysql>CREATE DATABASE owncloud;
3) mysql>GRANT ALL ON owncloud.* to 'owncloud'@'localhost' IDENTIFIED BY
'owncloud_database_password';
4) mysql>FLUSH PRIVILEGES;
Step 4 - Configure ownCloud
To access the ownCloud web interface, open a web browser and navigate to the servers IP
address as shown below.
®
Own Cloud portal has two types of users like Admin user and local user. The admin user
can create users/groups, assigns storage quota, assigns privileges and can manage users
and group activities.
®
®
The local user is an restricted user who can perform local activities like upload or share
files, delete local shares or can create share etc.
®
Cloud Computing L – 69 Cloud Computing - Lab
The alternate way to use own cloud is to download the readymade virtual machine from
website https://bitnami.com/stack/owncloud/cloud which can be run directly on
virtualization platform like virtual box or VMware workstation.
Lab 10 - To Create and access Windows Virtual machine using AWS EC2.
[Note : Above three labs are performed on AWS free tier account which are almost free
for everyone. So please create AWS Free tier account from
https://aws.amazon.com/free/ ].
The Steps to create and access Windows Virtual machine using AWS EC2 are as follows
Step 1 - Login to AWS portal and Select EC2 service from admin console.
®
Step 2 - The EC2 resource page will appear which will show you the summary of
instances. Now click on launch instance to select the VM instance type.
Step 3 - Select the operating system type in AMI format. In this example we have
selected Windows server instance which is eligible for free tier and click on Next.
®
Step 4 - Now select the hardware type for Virtual machine. In this example we have
selected free tier eligible General purpose hardware and click on Next.
Step 5 - Now specify the instance details like Number of instances, networking options
like VPC, Subnet ot dhcp public IP etc. and click on Next
Step 6 - Specify the storage space for VM and click on Next.
®
Step 7 - Click on Add tag to specify VM Name and click on Next.
Step 8 - Configure security group to provide access to VM using different protocols. In

this example we have selected default RDP protocol.
Step 9 - Now Review the instance and click on Launch button.
®
Step 10 - Now to secure VM instance, Encrypt it using public key and create a private key
pair to decrypt that. Here specify key pair name and download key pair.
Step 11 - Finally, Click on launch instance to Launch VM.
®
Step 12 - Now from summary page click on View instance to see the instance state. After
some time you will see the running instance of your VM.
®
Step 13 - Now Click on Connect to get the password for VM to access it over RDP
protocol.
Step 14 - Select the downloaded key pair files to decrypt the password.
®
Step 15 - Now connect the instance using RDP tool by using Ipaddress/DNS, username
and Password decrypted in last step.
®
Step 16 - Once you click on connect, you will see the running Windows virtual machine
as shown below.
Step 17 - You can shut down instance by selecting instance state followed by stop.
Step 18 - You can delete the instance permanently by selecting instance state followed
by stop.
®
®
Lab 11 : To host a word press website using Light sail service in AWS.
Step 1 - Open Admin console of AWS and select light sail service.
Step 2 - Select Create instance option.
Step 3 - Select the Linux hosting instance.
®
Step 4 - Select word press hosing.
Step 5 - Specify name to the instance.
®
Step 6 - Now click on create to launch the instance Step.
Step 7 - Click on connect instance to get the password for word press.
®
Step 8 - Now open bitnami_application_password file to get the admin password. So

copy it and use over admin console.
Step 9 - Now researve static ip by selecting network option and creating static IP.
®
Once static IP is allocated then open that ip on browser to see Word press Website.
®
Open admin console of Word press and use password obtained in step 8 to open word
press site builder.
Now you can develop a complete Word press website and use that.
®
Lab 12 : To demonstrate Storage as a service using Amazon S3.
Step 1 - Open AWS Console and open the service S3.
Step 2 - Now, click on create bucket to create a storage medium to store user’s data.
Specify name to the bucket along with the region where you want to create a bucket.
®
In next screen, select the versioning anf tag options if required. Otherwise click on next.
®
In next screen, set the public acces settings for the bucket and associated files as per
requirements.
Finally, click on Creat bucket to create a Empty bucket.
®
To store data such as files in to bucket, click on upload button.
Now, click on add files to add files from local computer followed by clicking upload
button.
During upload, set the user access permissions and storage classes if required.Upon
successful, the file uploaded in S3 bucket is shown below.
®
By opening a file, you can view the different file attributes and object URL through which
users can download that file by making it public.

®
Notes
®
Cloud Computing S-1 Model Question Paper
Model Question Paper

Cloud Computing
Semester - VII
(As per New Question Paper Pattern)
1
[Time : 2 Hours Total Marks : 100]
2
Instructions :
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.
Part – A (10 × 2 = 20 Marks)
Q.1 Highlight the importance of Cloud Computing.
Q.2 Explain the term “Elasticity in cloud computing”.
Q.3 Compare between different implementation levels of virtualization.
Q.4 Give the significance of virtualization.

Q.5 What is Amazon S3 ?
Q.6 What are the core components of Google app engine architecture ?
Q.7 What is the purpose of Open Authentication in the cloud computing ?
Q.8 How can the data security be enforced in cloud ?
Q.9 Mention the importance of Transport Level Security (TLS).
Q.10 What is Openstack ? Enlist its important components.

Part – B (13 × 5 = 65 Marks)
Q.11 a) Outline the similarities and differences between distributed computing, Grid computing
and Cloud computing.
OR
b) Explain in detail web services protocol stack and publish-subscribe models with respect to
web services.
Q.12 a) What is virtualization ? Describe para and full virtualization architectures, compare and
contrast them.
OR
b) Explain in brief NIST cloud computing reference architecture.

®
TECHNICAL PUBLICATIONS
S - 1 - An up thrust for knowledge
Cloud Computing S-2 Model Question Paper
Q.13 a) Describe Infrastructure as a Service (IaaS), Platform-as-a-Service (PaaS) and Software-as

a-Service (SaaS) with example.
OR
b) Give a detailed note on Hadoop framework.
Q.14 a) Write a detailed note on storage-as-a-service.
OR
b) Analyze how MapReduce framework supports parallel and distributed computing on

large data sets with a suitable example.
Q.15 a) Explain the functional architecture of the Google cloud platform for app engine in detail.
OR
b) Explain the baseline Identity and access management (IAM) factors to be practices by the
stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment.
Part – C (15 × 1 = 15 Marks)
Q.16 a) Write detailed note on Resource Provisioning along with different Resource Provisioning
Methods.
OR
b) Write a short note on levels of federation in cloud.


®

CS8791 - Cloud Computing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS8791 - Cloud Computing

Uploaded by

Copyright:

Available Formats

SUBJECT CODE : CS8791

Strictly as per Revised Syllabus of

Semester - VII (Computer Science and Engineering and Information Technology)

First Edition : September 2020

ã Copyright with Authors

TECHNICAL Amit Residency, Office No.1, 412, Shaniwar Peth,

9789390041220 [1] (ii)

1.1 Introduction to Cloud Computing

The cloud computing becomes a must have technology in every IT organization

Fig. 1.1.1 : Various aspects of cloud computing

1.1.1 Cloud Computing and Other Similar Configurations

a) Peer to Peer Architecture

Fig. 1.1.2 : Peer to peer architecture

b) Client Server Architecture

Fig. 1.1.3 : Client server Architecture

The generalized architecture of cluster computing based on master-slave approach is

Fig. 1.1.4 : Architecture of cluster computing

1.1.2 Advantages of Cloud Computing

 Minimizes licensing Cost of the Softwares : The remote delivery of software

 Server Consolidation : The server consolidation in cloud computing uses an

1.2 Definition of Cloud Computing

1.3 Evolution of Cloud Computing

1.3.1 Evolution of Hardware

b) Second-Generation Computers : The second-generation computing hardware

1.3.2 Evolution of Internet and Protocols

1.3.3 Evolution of Computing Technologies

1.3.4 Evolution of Processing Technologies

lesser time. The next advancement in parallel processing was multiprogramming.

1.4 Underlying Principles of Parallel and Distributed Computing

1.4.1 Serial Computing

Fig. 1.4.1 : Serial Computing

1.4.2 Parallel Computing

Fig. 1.4.2 : Parallel Computing

1.4.2.1 Hardware Architectures for Parallel Processing

Fig. 1.4.3 : Flynn’s Classification for parallel computers

a) Single Instruction, Single Data (SISD)

Fig. 1.4.4 : SISD architecture

b) Single Instruction, Multiple Data (SIMD)

Fig. 1.4.5 : SIMD architecture.

c) Multiple Instruction, Single Data (MISD)

Fig. 1.4.6 : MISD architecture.

d) Multiple Instruction, Multiple Data (MIMD)

Fig. 1.4.7 : MIMD architecture.

1.4.2.2 Shared Memory Architecture for Parallel Computers

2. Non-uniform memory access (NUMA) : This architecture uses one or more

1.4.2.3 Distributed Memory Architecture for Parallel Computers

Fig. 1.4.9 : Distributed memory architecture.

1.4.3 Distributed Computing

Fig. 1.4.10 : Conceptual view of distributed system

The architectural model of distributed system is related to the placement of different

The client-server model is widely used architecture in most of the distributed

In distributed system, the multiple concurrent processes used to interact continuously

1.5 Cloud Characteristics

 Rapid elasticity : In cloud, the different resource capabilities can be elastically

1.6 Elasticity in Cloud

scale-out solutions (horizontal scaling), which allows for resources to be dynamically

1.7 On-demand Provisioning

provider. The on-demand provisioning is used to provision the cloud services

1.8 Challenges in Cloud Computing

1.8.1 Data Protection

1.8.2 Data Recovery and Availability

1.8.3 Regulatory and Compliance Restrictions

1.8.4 Management Capabilities

1.8.5 Interoperability and Compatibility Issue