Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 242

Chapter 4

Open Source cloud Implementation


and Administration
Cloud Roles
Cloud Roles
1. Managers
– Availability of cloud resources
– Quality of cloud services
– Cloud usage billing and costing
– Establishing IT processes and best practices
2. Administrators
– Daily production and operational support of cloud platform
– Continuous monitoring and status reporting of cloud
platform
– Maintaining service level agreements
Cloud Roles
3. Application Architects
– Developing and adapting applications to cloud deployments
– Information management and adapting data management to
cloud deployments
– Cloud Service design, implementation, and lifecycle support
4. Users
– On-demand provisioning of compute, network, and storage
resources
– Self-service configuration of cloud resources
– Transparency on service costs and levels
OPEN STACK
What is OpenStack
• Open source cloud computing platform that used for
building private and public clouds
• Aims for simple implementation, massive scalability, and a
rich set of features.
• Provides IaaS solution through a variety of services.
• Each service offers an API that facilitates integration.
• Designed for flexibility and many different use cases
• Enables multi-tenancy
• Quota for different users
• Users can be associated with multiple tenants
OpenStack provides
• Provides virtual machines (VM) on demand
• Self service provisioning
• Snapshotting capability
• Storage volumes
• Block storage for VM images
• Object storage for VM images and arbitrary files
Why is OpenStack
• Launched by NASA and Rackspace in 2010
• Open Source Cloud software
• Managed by the OpenStack Foundation
• Rapidly taking over the Cloud world
• Designed to scale cost effectively
• Emerging standard backed by large ecosystem
• Open source with no lock-in or license
• Massively scalable
Why is OpenStack : Massively scalable
Features of OpenStack
• Live upgrades:
• Upgrade controller infrastructure
• Computer node

• Federated Identity
– Logging into multiple open stack nodes through single
user ID
– Special request from europian organization for nuclear
research(CERN)
Features of OpenStack
• Trove
– manage database resources like MySQL for manipulating
users and schema
– Manipulation is done through Trove APIs
– Original term use is Project Red Dwarf
– MongoDB, , Cassandra, etc

• Object Storage Replication


– Known as ssync used for intercepting requests that are
forwarded or coming out of Swift(object storage system)
– New mechanism for replicating the object store
Other Features of OpenStack
• Compute service
• Networking
• Dashboard
• Identity service
• Image management service
• Orchestration
• Cloud orchestration is the use of programming technology to
manage the interconnections and interactions among
workloads on public and private cloud infrastructure. It
connects automated tasks into a cohesive workflow to
accomplish a goal, with permissions oversight and policy
enforcement.
OpenStack Community
Today
Services of OpenStack
The Services of OpenStack
Sr No Components/ Service Project Name
1 Compute Nova
2 Object Storage Swift
3 Block Storage Cinder
4 Networking Neutron
5 Dashboard Horizon
6 Identity Service Keystone
7 Image Service Glance
8 Telemetry Ceilometer
9 Orchestration Heat
Conceptual Architecture
Components of OpenStack
1. Compute(Nova)
•The Cloud operating system enables enterprises and service
providers to offer on-demand computing resources, by
provisioning and managing large networks of virtual machines.
•Compute resources are accessible via APIs for developers
building cloud applications and via web interfaces for
administrators and users.
•The compute architecture is designed to scale horizontally on
standard hardware
• Nova works as a fabric controller in the cloud
computing environment
• Fabric controller is the primary part of
construction of IaaS system
• Nova is coded in python but various external
libraries are used.e.g Eventlet,Kombu and
SQLAlchemy
• The libraries add concurrent programming, DB
access, AMQP ( Advanced Message Queuing
Protocol )communication for business
• Objective of designing nova is to automate and
manage pools of compute resources.
Components of OpenStack
2. Object Storage(Swift)
•OpenStack provides redundant, scalable object storage
using clusters of standardized servers capable of storing
petabytes of data
•Object Storage is not a traditional file system, but rather a
distributed storage system for static data such as virtual
machine images, photo storage, email storage, backups and
archives. Having no central "brain" or master point of control
provides greater scalability, redundancy and durability.
Components of OpenStack
2. Object Storage(Swift)
•Objects and files are written to multiple disk drives
spread throughout servers in the data center, with the
OpenStack software responsible for ensuring data
replication and integrity across the cluster.
•Storage clusters scale horizontally simply by adding
new servers. OpenStack replicates its content from
other active nodes to new locations in the cluster.
Because
• OpenStack uses software logic to ensure data
replication and distribution across different devices
Components of OpenStack
2. Object Storage(Swift)
•This enables you to use economical hard drive and
server for storing data
•If you require storage system that provides scaling
facility within economical cost Swift is the ideal solution
for you.
•Swift storage platform is completely distributed and
can be accessed through APIs.
•You can integrate it into your application for backing
up archiving and retaining data
Components of OpenStack
2. Object Storage(Swift)
•the block storage facility permits you to expose block
devices.
•You can connect these devices to compute instances
for expanding the storage, enhancing performance.
Components of OpenStack
3. Block Storage(Cinder)
•OpenStack provides persistent block level storage
devices for use with OpenStack compute instances.
•The block storage system manages the creation,
attaching and detaching of the block devices to servers.
•Block storage volumes are fully integrated into
OpenStack Compute and the Dashboard allowing for
cloud users to manage their own storage needs.
Components of OpenStack
3. Block Storage(Cinder)
•In addition to using simple Linux server storage, it has
unified storage support for numerous storage platforms
including Ceph, NetApp, Nexenta, SolidFire, and Zadara.
•Block storage is appropriate for performance sensitive
scenarios such as database storage, expandable file
systems, or providing a server with access to raw block
level storage.
•Snapshot management provides powerful functionality
for backing up data stored on block storage volumes.
Components of OpenStack
4. Networking(Neutron)
•Neutron provides the networking capability for
OpenStack. It helps to ensure that each of the
components of an OpenStack deployment can
communicate with one another quickly and efficiently.
•OpenStack provides flexible networking models to suit
the needs of different applications or user groups.
•Standard models include flat networks or VLANs for
separation of servers and traffic.
Components of OpenStack
4. Networking(Neutron)
•It manages IP addresses, allowing for dedicated static
IPs or DHCP. (dynamic host config protocol)
•Floating IPs allow traffic to be dynamically rerouted to
any of your compute resources, which allows you to
redirect traffic during maintenance or in the case of
failure. 
•Users can create their own networks, control traffic and
connect servers and devices to one or more networks.
Components of OpenStack
4. Networking(Neutron)
•Administrators can take advantage of software-defined
networking (SDN) technology like OpenFlow to allow for
high levels of multi-tenancy and massive scale.
•OpenStack Networking has an extension framework
allowing additional network services, such as intrusion
detection systems (IDS), load balancing, firewalls and
virtual private networks (VPN) to be deployed and
managed.
• A floating IP address is a service provided by
Neutron. It's not using any DHCP service or
being set statically within the guest. As a
matter of fact, the guest's operating system
has no idea that it was assigned a floating IP
address.
• The delivery of packets to the interface with
the assigned floating address is the
responsibility of Neutron's L3 agent. Instances
with an assigned floating IP address can be
accessed from the public network by the
floating IP.
• A floating IP address and a private IP address
can be used at the same time on a single
network interface.
• The private IP address is likely to be used for
accessing the instance by other instances in
private networks while the floating IP address
would be used for accessing the instance from
public networks.
• A setup with 2 compute nodes(A compute
node provides the ephemeral storage,
networking, memory, and processing
resources that can be consumed by virtual
machine instances. ), one Neutron controller
(where the Neutron service, DHCP agent and
l3 agent run), a physical router and a user.
• Let the physical subnet be 10.0.0.0/24. On the
compute nodes instances are running using
the private IP range 192.168.1.0/24. One of
the instances is a webserver that should be
reachable from a public network. Network
outline:
Floating IP
https://www.quora.com/What-are-floating-IP-Address
• As shown in the picture above, the webserver
is running on an instance with private IP
192.168.1.2. A User from network 10.0.0.0/24
wants to access the webserver but he's not
part of private network 192.168.1.0/24.
• Using floating IP address 10.0.0.100 enables
the user to fetch webpages from the
webserver. The destination address is
translated by the NAT table (IP tables) within
the virtual router deployed on the controller.
Components of OpenStack
5. Dashboard(Horizon)
•Dashboard provides administrators and users a graphical
interface to access, provision and automate cloud-based
resources. The extensible design makes it easy to plug in
and expose third party products and services, such as
billing, monitoring and additional management tools. The
dashboard is also brandable for service providers & other
commercial vendors who want to make use of it.
•The dashboard is just one way to interact with OpenStack
resources. Developers can automate access or build tools
to manage their resources using the native OpenStack API
Components of OpenStack
6. Identity Services(Keystone)

•OpenStack Identity provides a central directory of


users mapped to the OpenStack services they can
access.

•It acts as a common authentication system across the


cloud operating system and can integrate with existing
backend directory services like LDAP. (Lightweight
Directory Access Protocol )
Components of OpenStack
6. Identity Services(Keystone)
•LDAP (Lightweight Directory Access Protocol) is a
software protocol for enabling anyone to locate
organizations, individuals, and other resources such as
files and devices in a network, whether on the public 
Internet or on a corporate intranet. 

•It supports multiple forms of authentication including


standard username and password credentials, token-
based systems and AWS-style logins.
Components of OpenStack
6. Identity Services(Keystone)
•Additionally, the catalog provides a query able list of all of
the services deployed in an OpenStack cloud in a single
registry. Users and third-party tools can programmatically
determine which resources they can access.
As an administrator, OpenStack Identity enables you to:
•Configure centralized policies across users and systems
•Create users and tenants and define permissions for
compute, storage and networking resources using role-based
access control (RBAC) features
Components of OpenStack
6. Identity Services(Keystone)
•Integrate with an existing directory like LDAP, allowing
for a single source of identity authentication across the
enterprise
As a user, OpenStack Identity enables you to:
•Get a list of the services that you can access
•Make API requests or log into the web dashboard to
create resources owned by your account
Components of OpenStack
7. Image Service(Glance)
•The OpenStack Image Service provides discovery,
registration and delivery services for disk and server
images.
•The ability to copy or snapshot a server image and
immediately store it away is a powerful capability of
the OpenStack cloud operating system.
•Stored images can be used as a template to get new
servers up and running quickly— and more consistently
if you are provisioning multiple servers— than installing
a server operating system
Components of OpenStack
and individually configuring additional services. It can
also be used to store and catalog an unlimited number
of backups.
•The Image Service can store disk and server images in
a variety of back-ends, including OpenStack Object
Storage. 
•The Image Service API provides a standard REST
interface for querying information about disk images
and lets clients stream the images to new servers.
Capabilities of the Image Service include:
•Administrators can create base templates from which
their users can start new compute instances
Components of OpenStack
• Users can choose from available images, or create their own
from existing servers
• Snapshots can also be stored in the Image Service so that
virtual machines can be backed up quickly
• A multi-format image registry, the image service allows uploads
of private and public images in a variety of formats, including:
– VHD (Hyper-V)
– VDI (VirtualBox)
– qcow2 (Qemu/KVM)
– VMDK (VMWare)
– OVF (VMWare, others)
Components of OpenStack
8. Orchestration (Heat)
•OpenStack Orchestration is a template-driven engine that
allows application developers to describe and automate the
deployment of infrastructure.
•The flexible template language can specify compute, storage
and networking configurations as well as detailed post-
deployment activity to automate the full provisioning of
infrastructure as well as services and applications.
•Through integration with the Telemetry service, the
Orchestration engine can also perform auto-scaling of certain
infrastructure elements.
Components of OpenStack
9. Telemetry(Ceilometer)
•The OpenStack Telemetry service aggregates usage and
performance data across the services deployed in an
OpenStack cloud.
•This powerful capability provides visibility and insight
into the usage of the cloud across dozens of data points
and allows cloud operators to view metrics globally or by
individual deployed resources.
•Monitors and meters the OpenStack cloud for billing,
benchmarking, scalability, and statistical purposes
OpenStack Services
Project
Service Description
name
Provides a web-based self-service
portal to interact with underlying
OpenStack services, such as
Dashboard Horizon
launching an instance, assigning IP
addresses and configuring access
controls.
Manages the lifecycle of compute
instances in an OpenStack
environment. Responsibilities
Compute Nova
include spawning, scheduling and
decommissioning of virtual
machines on demand.
OpenStack Services
Project
Service Description
name
Enables Network-Connectivity-
as-a-Service for other OpenStack
services, such as OpenStack
Compute. Provides an API for
users to define networks and the
Networking Neutron
attachments into them. Has a
pluggable architecture that
supports many popular
networking vendors and
technologies.
OpenStack Services
Storage
Stores and retrieves arbitrary
unstructured data objects via a 
RESTful, HTTP based API. It is highly
Object St Swift fault tolerant with its data replication
orage and scale out architecture. Its
implementation is not like a file server
with mountable directories.
Provides persistent block storage to
running instances. Its pluggable driver
Block Sto Cinder architecture facilitates the creation
rage and management of block storage
devices.
OpenStack Services
Shared services
Provides an authentication and
authorization service for other
Identity Keystone OpenStack services. Provides a
service catalog of endpoints for all
OpenStack services.
Stores and retrieves virtual
Image machine disk images. OpenStack
Glance
Service Compute makes use of this during
instance provisioning.
Monitors and meters the OpenStack
Telemet
Ceilometer cloud for billing, benchmarking,
ry
scalability, and statistical purposes.
OpenStack Services
Higher-level services
Orchestrates multiple composite
cloud applications by using either
the native HOT template format or
the AWS CloudFormation template
Orchestration Heat
format, through both an
OpenStack-native REST API and a
CloudFormation-compatible Query
API.
Provides scalable and reliable
Cloud Database-as-a-Service
Database
Trove functionality for both relational
Service
and non-relational database
engines.
Logical Openstack Architecture
Logical Openstack Architecture
• To design, deploy, and configure OpenStack, administrators
must understand the logical architecture.
• It consists of several independent parts, named the OpenStack
services. All services authenticate through a common Identity
service. Individual services interact with each other through
public APIs, except where privileged administrator commands
are necessary.
• Internally, OpenStack services are composed of several
processes. All services have at least one API process, which
listens for API requests, preprocesses them and passes them on
to other parts of the service. With the exception of the Identity
service, the actual work is done by distinct processes.
Modes of operation
• Single host mode- based on central server

• Multi host mode- if a copy of the network is


run on each of the compute nodes and the
nodes are used as the Internet gateway by the
instances that are running on individual
nodes.
• The floating IPs and security groups are also
hosted on these compute node
Cloud Programming
Outline
• Introduction
• Programming Support for Google Apps engine
• Google File System
• Big Table as Googles NoSQL System
• Chubby as Google Distributed Lock Service
• Programming Support for Amazon EC2
• Amazon S3
• EBS
• Simple DB
• Summary
04/21/20 54
Cloud Programming

• Best Cloud programming practices requires attention


to certain things includes
– Knowing your Tools
– Designing Scalable Applications
– Making Application Secure
– Reviewing the Architecture of the Software
– Designing Infrastructure for Dynamism

04/21/20 55
Google App Engine

• GAE is a PaaS cloud computing platform for


developing and hosting web applications in Google-
managed data centers.
• Google App Engine lets you run web applications on
Google's infrastructure.

• Features
– Easy to build.
– Easy to maintain.
– Easy to scale as the traffic and storage needs grow.

04/21/20 56
Google App Engine
• GAE makes it easy to build and deploy an application that
runs reliably even under heavy load and with large
amounts of data.
• Features:
– Persistent storage with queries, sorting, and transactions.
– Automatic scaling and load balancing.
– Asynchronous task queues for performing work outside the
scope of a request.
– Scheduled tasks for triggering events at specified times or
regular intervals.
– Integration with other Google cloud services and APIs.
04/21/20 57
Google App Engine: Is it free?

• Yes
• free for upto 1 GB of storage and enough CPU
• bandwidth to support 5 million page views a
month
• 10 Applications per Google account

04/21/20 58
GAE: Programming
languages support
• Google App Engine supports apps written in a
variety of programming languages.
• Java
• Python
• PHP
• Go

04/21/20 59
GAE: Programming languages
support for java
• Java:
• App Engine runs JAVA apps on a JAVA 7 virtual machine
(currently supports JAVA 6 as well).
• Uses JAVA Servlet standard for web applications:
• WAR (Web Applications ARchive) directory structure.
• Servlet classes
• Java Server Pages (JSP)
• Static and data files
• Deployment descriptor (web.xml)
• Other configuration files
• Getting started :
– https://developers.google.com/appengine/docs/java/getti
04/21/20
ngstarted/ 60
GAE: Programming languages
support for python
Python:
• Uses WSGI (Web Server Gateway Interface) standard.
• Python applications can be written using:
• Webapp2 framework
• Django framework
• Any python code that uses the CGI (Common Gateway
Interface) standard.
•Getting started :
– https://developers.google.com/appengine/docs/python/g
ettingstartedpython27/

04/21/20 61
GAE: Programming languages
support for PHP
• PHP (Experimental support):
• Local development servers are available to anyone for
developing and testing local applications.
• Only whitelisted applications can be deployed on Google App
Engine. (https://gaeforphp.appspot.com/).

• Getting started:
https://developers.google.com/appengine/docs/php/

04/21/20 62
Google App Engine: Programming
languages support for java
• Google’s Go:
• Go is an Google’s open source programming environment.
• Tightly coupled with Google App Engine.
• Applications can be written using App Engine’s Go SDK.

• Getting started:
https://developers.google.com/appengine/docs/go/overview

04/21/20 63
When to use GAE

• You don’t want to get troubled for setting up a server.


• You want instant for-free nearly infinite scalability
support.
• Your application’s traffic is spiky and rather
unpredictable.
• You don't feel like taking care of your own server
monitoring tools.
• You need pricing that fits your actual usage and isn't
time-slot based (App engine provides pay-per-use cost
model).
• You are able to work without direct access to local file
system.
04/21/20 64
Steps for deployment on GAE
Google developers console
•Create Account with google
•Create new project with projectID(unique)
Local machine
•Install python SDK
•Install Google App Engine SDK
•Create directory for your project and Create .py and .yaml file
inside it
•Start webserver
•Deploy on Google cloud
04/21/20 65
Programming Support for
Google Apps engine
• Google provides programming support for
its cloud environment, Google Apps
Engine through

‫ ־‬Google File System(GFS),


‫ ־‬Big Table as Googles NoSQL System and
‫ ־‬Chubby as Google Distributed Lock Service

04/21/20 66
Google File System
(GFS)
GFS
• Google File System is a scalable distributed file system
for large distributed data-intensive applications.
• It provides fault tolerance while running on
inexpensive commodity hardware (Commodity
hardware is a term for affordable devices that are
generally compatible with other such devices.
• In a process called commodity computing or
commodity cluster computing, these devices are often
networked to provide more processing power when
those who own them cannot afford to purchase more
elaborate supercomputers, or want to maximize
savings in IT design.),
04/21/20 68
GFS
• and it delivers high aggregate performance to a large
number of clients.
• GFS shares many of the same goals as previous
distributed file systems such as performance,
scalability, reliability, and availability.
GFS has designed with certain assumptions that also
provide opportunities for developers researchers
• Automatic recovery from component failure on
routine basis
• Efficient storage support for large size files as huge
amount of data to be processed is stored in these file
Storage support for small size files without requiring
04/21/20 69
any optimization for them
• With the workloads mainly consisting of 2
large streaming reads and small random reads
the system should be performance conscious
(aware of and responding to one's) so that
small reads are made steady rather than going
back and forth
• Semantics are defined well
• Atomicity is maintained least overhead due to
synchronization
• Provision for sustained bandwidth is given
priority than latency(the delay before a
transfer of data begins following an instruction
for its transfer.)
• Google takes the assumption listed above into
consideration and support its cloud platform,
Google App Engine through GFS
04/21/20 72
3.2. Masterservers.
Chunk
      
       servers
3.      Chunk servers.

GFS Architecture
• GFS is clusters of computers. A cluster is
simply a network of computers. Each cluster
might contain hundreds or even thousands of
machines. In each GFS clusters there are three
main entities:

04/21/20 73
http://programming-project.blogspot.com/2014/04/general-architecture-of-
google-file.html

1.      Clients
2.      Master servers
3.      Chunk servers
• Client can be other computers or computer
applications and make a file request.
• Requests can range from retrieving and
manipulating existing files to creating new
files on the system.
• Clients can be thought as customers of the
GFS
• Master Server is the coordinator for the cluster. Its task
include:-
• Maintaining an operation log, that keeps track of the
activities of the cluster.
• The operation log helps keep service interruptions to a
minimum if the master server crashes, a replacement
server that has monitored the operation log can take its
place.

• 2. The master server also keeps track of metadata,


which is the information that describes chunks. The
metadata tells the master server to which files the
• The master server also keeps track of
metadata, which is the information that
describes chunks.
• The metadata tells the master server to which
files the chunks belong and where they fit
within the overall file.
• Chunk Servers are the workhorses of the GFS.
They store 64-MB file chunks.
• The chunk servers don't send chunks to the
master server. Instead, they send requested
chunks directly to the client.
• The GFS copies every chunk multiple times
and stores it on different chunk servers. Each
copy is called a replica.
• By default, the GFS makes three replicas per
chunk, but users can change the setting and
make more or fewer replicas if desired.
• Management done to overloading single
master in Google File System
• Having a single master enables the master to
make sophisticated chunk placement and
replication decisions using global knowledge.
• However, the involvement of master in reads
and writes must be minimized so that it does
not become a bottleneck.
• Clients never read and write file data through
the master.
• Instead, a client asks the master which chunk
servers it should contact.
• It caches this information for a limited time
and interacts with the chunk servers directly
for many subsequent operations.
Need For GFS

• Large Data Files


• Scalability
• Reliability
• Automation
• Replication of data
• Fault Tolerance

04/21/20 81
GFS Architecture
• A GFS cluster consists of a single master and multiple
chunkservers and is accessed by multiple clients.
• Each of these is typically a commodity Linux machine
running a user-level server process.
• Files are divided into fixed-size chunks. Each chunk is
identified by an immutable and globally unique 64 bit chunk
handle assigned by the master at the time of chunk
creation.
• Chunkservers store chunks on local disks as Linux files and
read or write chunk data specified by a chunk handle and
byte range.
04/21/20 82
GFS Architecture
• For reliability, each chunk is replicated on multiple
chunk servers. By default, GFS store three replicas,
though users can designate different replication
levels for different regions of the file namespace.
• The master maintains all file system metadata. This
includes the namespace, access control information,
the mapping from files to chunks, and the current
locations of chunks.

04/21/20 83
GFS Architecture
• Cluster Computing
Single Master
Multiple Chunk Servers
– Stores 64 MB file chunks

Multiple clients

04/21/20 84
GFS cluster

04/21/20 85
GFS cluster
• Single Master:
– Minimal Master Load.
– Fixed chunk Size.
– The master also predicatively provide chunk locations immediately
following those requested by unique id.
• Chunk Size :
– 64 MB size.
– Read and write operations on same chunk.
– Reduces network overhead and size of metadata in the master
• Operation Log:
– Keeps track of activities.
– It stores on multiple remote locations .
04/21/20 86
GFS cluster
• Types of Metadata:
– File and chunk namespaces
– Mapping from files to chunks
– Location of each chunks replicas
• In-memory data structures:
– Master operations are fast.
– Periodic scanning entire state is easy and efficient
• Chunk Locations:
– Master polls chunk server for the information.
– Client request data from chunk server.

04/21/20 87
GFS cluster
• Atomic Record Appends:
– GFS offers Record Append .
– Clients on different machines append to the same file concurrently.
– The data is written at least once as an atomic unit.
• Snapshot:
– It creates quick copy of files or a directory .
– Master revokes lease for that file
– Duplicate metadata
– On first write to a chunk after the snapshot operation
– All chunk servers create new chunk
– Data can be copied locally

04/21/20 88
GFS : Master Operation
 Namespace Management and Locking:
– GFS maps full pathname to Metadata in a table.
– Each master operation acquires a set of locks.
– Locking scheme allows concurrent mutations in same
directory.
– Locks are acquired in a consistent total order to prevent
deadlock.
 Replica Placement:
– Maximizes reliability, availability and network bandwidth
utilization.
– Spread chunk replicas across racks
04/21/20 89
GFS : Master Operation
 Create:
o Equalize disk utilization.
o Limit the number of creation on chunk server.
o Spread replicas across racks.
 Re-replication:
o Re-replication of chunk happens on priority.
 Rebalancing:
o Move replica for better disk space and load balancing.
o Remove replicas on chunk servers with below average free space.
 Data Integrity:
o Check sum every 64 MB block in each chunk.

04/21/20 90
GFS : Master Operation
 Garbage Collection:
o Makes system Simpler and more reliable.
o Master logs the deletion, renames the file to a hidden name.
 Stale Replica detection:
o Chunk version number identifies the stale replicas.
o Client or chunk server verifies the version number.
 High availability:
o Fast recovery.
o Chunk replication.
o Shadow Masters.

04/21/20 91
GFS Interaction

04/21/20 92
GFS Interaction

04/21/20 93
GFS Interaction
1. The client asks the master which chunkserver holds the current
lease for the chunk and the locations of the other replicas.
2. The master replies with the identity of the primary and the
locations of the other (secondary) replicas.
3. The client pushes the data to all the replicas. A client can do so
in any order.
4. Once all the replicas have acknowledged receiving the data, the
client sends a write request to the primary. The request
identifies the data pushed earlier to all of the replicas. The
primary assigns consecutive serial numbers to all the mutations
it receives, possibly from multiple clients, which provides the
necessary serialization.

04/21/20 94
GFS Interaction
5. The primary forwards the write request to all secondary
replicas. Each secondary replica applies mutations in the
same serial number order assigned by the primary.
6. The secondaries all reply to the primary indicating that they
have completed the operation.
7. The primary replies to the client. Any errors encountered at
any of the replicas are reported to the client. In case of
errors, the write may have succeeded at the primary and an
arbitrary subset of the secondary replicas. (If it had failed at
the primary, it would not have been assigned a serial
number and forwarded.)

04/21/20 95
GFS: Issues and solution
• More Infrastructure Requirements
– Disciplined programming approach
• Corruption of data
– Checksum for detection
• Single reader – writer problem
– Replace mmap() by pread() which requires an
extra copy of entire data

04/21/20 96
BigTable as Google’s NoSQL System:
A Distributed Storage System
Types of NoSQL databases-
There are 4 basic types of NoSQL databases:
•Key-Value Store – It has a Big Hash Table of keys
& values {Example- Riak, Amazon S3 (Dynamo)}
•Document-based Store- It stores documents
made up of tagged elements. {Example- CouchDB}
•Column-based Store- Each storage block contains
data from only one column, {Example- HBase,
Cassandra}
•Graph-based-A network database that uses
edges and nodes to represent and store data.
{Example- Neo4J}
Big Table
• Cloud Bigtable is a sparsely populated table that can scale
to billions of rows and thousands of columns, enabling you
to store terabytes or even petabytes of data.
• A single value in each row is indexed; this value is known
as the row key.
• Cloud Bigtable is ideal for storing very large amounts of
single-keyed data with very low latency.
• It supports high read and write throughput at low latency,
and it is an ideal data source for MapReduce operations

04/21/20 99
• You can use Cloud Bigtable to store and query
all of the following types of data:
Time-series data, such as CPU and memory
usage over time for multiple servers.
Marketing data, such as purchase histories and
customer preferences.
Financial data, such as transaction histories,
stock prices, and currency exchange rates.
Internet of Things data, such as usage reports
from energy meters and home appliances.
Graph data, such as information about how
users are connected to one another.
Cloud Bigtable storage model
• Cloud Bigtable stores data in massively scalable
tables, each of which is a sorted key/value map.
• The table is composed of rows, each of which
typically describes a single entity, and columns,
which contain individual values for each row.
• Each row is indexed by a single row key, and
columns that are related to one another are
typically grouped together into a column family.
• Each column is identified by a combination of the
column family and a column qualifier, which is a
unique name within the column family.
• Each row/column intersection can contain
multiple cells at different timestamps,
providing a record of how the stored data has
been altered over time.
• Cloud Bigtable tables are sparse; if a cell does
not contain any data, it does not take up any
space.
• For example, suppose you're building a social
network for United States presidents—let's
call it Prezzy. Each president can follow posts
from other presidents. The following
illustration shows a Cloud Bigtable table that
tracks who each president is following on
Prezzy:
• The table contains one column family, the follows
family. This family contains multiple column
qualifiers.
• Column qualifiers are used as data. This design
choice takes advantage of the sparseness of Cloud
Bigtable tables, and the fact that new column
qualifiers can be added on the fly.
• The username is used as the row key. Assuming
usernames are evenly spread across the alphabet,
data access will be reasonably uniform across the
entire table.
Cloud Bigtable architecture
•  all client requests go through a front-end server
before they are sent to a Cloud Bigtable node.
(In the original Bigtable whitepaper, these nodes
are called "tablet servers.")
• The nodes are organized into a Cloud Bigtable
cluster, which belongs to a Cloud Bigtable
instance, a container for the cluster
• Each node in the cluster handles a subset of
the requests to the cluster. By adding nodes
to a cluster, you can increase the number of
simultaneous requests that the cluster can
handle, as well as the maximum throughput
for the entire cluster.
• If you enable replication by adding a second
cluster, you can also send different types of
traffic to different clusters
• you can fail over to one cluster if the other
cluster becomes unavailable.
• A Cloud Bigtable table is sharded (partitions for
fast access)into blocks of contiguous rows,
called tablets, to help balance the workload of
queries. (Tablets are similar to HBase regions.)
Tablets are stored on Colossus, Google's file
system, in SSTable format.

• An SSTable(Sorted Strings Table which stores a


set of immutable row fragments in sorted order
based on row keys. SSTable files of a column
family are stored in its respective column family
directory. )
• It provides a persistent, ordered immutable map
from keys to values, where both keys and values
are arbitrary byte strings.

• Each tablet is associated with a specific Cloud


Bigtable node. In addition to the SSTable files, all
writes are stored in Colossus's shared log as soon
as they are acknowledged by Cloud Bigtable,
providing increased durability.
• Importantly, data is never stored in Cloud
Bigtable nodes themselves; each node has
pointers to a set of tablets(A Cloud
Bigtable table is sharded into blocks of
contiguous rows, called tablets, to help balance
the workload of queries.) that are stored on
Colossus. As a result:
• Rebalancing tablets from one node to another is
very fast, because the actual data is not copied.
Cloud Bigtable simply updates the pointers for
each node.
• Recovery from the failure of a Cloud Bigtable
node is very fast, because only metadata needs
to be migrated to the replacement node.
• When a Cloud Bigtable node fails, no data is
lost.
Big Table
• BigTable is a distributed storage system for managing
semi-structured data.
• Designed to scale to a very large size
– Petabytes of data across thousands of servers
• Used for many Google projects
– Web indexing, Personalized Search, Google Earth, Google
Analytics, Google Finance, …
• Flexible, high-performance solution for all of
Google’s products

04/21/20 115
Big Table
• Lots of (semi-)structured data at Google
– URLs:
• Contents, crawl metadata, links, anchors, pagerank, …
– Per-user data:
• User preference settings, recent queries/search results, …
– Geographic locations:
• Physical entities (shops, restaurants, etc.), roads, satellite image
data, user annotations, …
• Scale is large
– Billions of URLs, many versions/page (~20K/version)
– Hundreds of millions of users, thousands or q/sec
– 100TB+ of satellite image data

04/21/20 116
BigTable
Goals
•Want asynchronous processes to be continuously
updating different pieces of data
– Want access to most current data at any time
•Need to support:
– Very high read/write rates (millions of IOPS)
– Efficient scans over all or interesting subsets of data
– Efficient joins of large one-to-one and one-to-many datasets
•Often want to examine data changes over time
– E.g. Contents of a web page over multiple crawls

04/21/20 117
BigTable
Features
•Distributed multi-level map
•Fault-tolerant, persistent
•Scalable
– Thousands of servers
– Terabytes of in-memory data
– Petabyte of disk-based data
– Millions of reads/writes per second, efficient scans
•Self-managing
– Servers can be added/removed dynamically
– Servers adjust to load imbalance

04/21/20 118
BigTable: Building Blocks
1. Google File System (GFS): Raw storage
– stores persistent data (SSTable file format for storage )

2. Scheduler: schedules jobs onto machines


– Scheduler: schedules jobs involved in BigTable serving

3. Lock service: distributed lock manager


– Lock service: master election, location bootstrapping

4. MapReduce: simplified large-scale data processing


– Map Reduce: often used to read/write BigTable data

04/21/20 119
BigTable: Basic Data Model

04/21/20 120
Basic Data Model
• A BigTable is a sparse, distributed, persistent, multi-
dimensional, sorted map
(row, column, timestamp) -> cell contents

• Good match for most Google applications

04/21/20 121
Basic Data Model: Rows

• Name is an arbitrary string


– Access to data in a row is atomic
– Row creation is implicit upon storing data
• Rows ordered lexicographically
– Rows close together lexicographically usually on one or a
small number of machines
04/21/20 122
Basic Data Model: Columns

• Columns have two-level name structure:


• column_family : optional_qualifier
• Column family
– Unit of access control
– Has associated data type information
• Qualifier gives unbounded columns
– Additional levels of indexing, if desired

04/21/20 123
Basic Data Model : Timestamps

• Used to store different versions of data in a cell


– New writes default to current time, but timestamps for writes can also be
set explicitly by clients
• Lookup options:
– “Return most recent K values”
– “Return all values in timestamp range (or all values)”
• Column families can be marked w/ attributes:
– “Only retain most recent K values in a cell”
– “Keep values until they are older than K seconds ”

04/21/20 124
Example

04/21/20 125
04/21/20 126
Example 1

04/21/20 127
Example 2

04/21/20 128
Example 3

04/21/20 129
Bigtable Example

04/21/20 130
BigTable: Components

04/21/20 131
BigTable: Components
• Client
• One master server
– Responsible for
• Assigning tablets to tablet servers
• Detecting addition and expiration of tablet servers
• Balancing tablet-server load
• Garbage collection
• Many tablet servers
– Tablet servers handle read and write requests to its table
– Splits tablets that have grown too large

04/21/20 132
BigTable: Tablets
• Large tables broken into tablets at row boundaries
– Tablet holds contiguous range of rows
• Clients can often choose row keys to achieve locality
– Aim for ~100MB to 200MB of data per tablet
• Serving machine responsible for ~100 tablets
– Fast recovery:
• 100 machines each pick up 1 tablet for failed machine
– Fine-grained load balancing:
• Migrate tablets away from overloaded machine
• Master makes load-balancing decisions

04/21/20 133
04/21/20 134
BigTable: Tablet Assignment

• Each tablet is assigned to one tablet server at a time.

• Master server keeps track of the set of live tablet


servers and current assignments of tablets to
servers.

• keeps track of unassigned tablets. When a tablet is


unassigned, master assigns the tablet to an tablet
server with sufficient room.

04/21/20 135
Big Table API
• Metadata operations
– Create/delete tables, column families, change metadata
• Writes (atomic)
– Set(): write cells in a row
– DeleteCells(): delete cells in a row
– DeleteRow(): delete all cells in a row
• Reads
– Scanner: read arbitrary cells in a bigtable
• Each row read is atomic
• Can restrict returned rows to a particular range
• Can ask for just data from 1 row, all rows, etc.
• Can ask for all columns, just certain column families, or specific columns

04/21/20 136
Benefits of BigTable
• Elastic scaling
• Bigger Data Handling Capability
• Maintaining NoSQL Servers is Cheaper
• Lesser Server Cost
• No Schema or Fixed Data model
• Integrated Caching Facility

04/21/20 137
Chubby
(Google Distributed Lock Service)
Chubby : Distributed Lock service
• provide reliable storage to the loosely coupled distributed
system
• Synchronize access to shared resources
• Goals
– High availability
– Reliability
• Anti-goals:
– High performance
– Throughput
– Storage capacity
04/21/20 139
Chubby
• Presents a simple distributed file system
• Clients can open/close/read/write files
– Reads and writes are whole-file
– Also supports advisory reader/writer locks
– Clients can register for notification of file update

04/21/20 140
Chubby : Architecture

04/21/20 141
Chubby : Architecture
• Chubby cell has set of servers (or replicas)
• All client requests are directed to master
– updates propagated to replicas
– master periodically polls for failed replicas
• Chubby cell is usually 5 replicas
– 3 must be alive for cell to be viable
• Periodically elected master from 5 replicas
• How do replicas in Chubby agree on their own master,
official lock values?
– PAXOS (distributed) algorithm

04/21/20 142
Chubby : Architecture
• Master election is simple: all replicas try to acquire a
write lock on designated file. The one who gets the
lock is the master.
– Master can then write its address to file; other replicas can
read this file to discover the chosen master name.

• Paxos is a family of algorithms (by Leslie Lamport)


designed to provide distributed consensus in a
network of several processors

© Spinnaker Labs, Inc.


Chubby : Architecture
• Support a similar file system as UNIX
• The nodes are opened to obtain the unix file
descriptor(a file descriptor is an abstract indicator
used to access a file or other input/output
resource, ) known as handles
• The specifier for the handle include check digit for
the guess handle for client
• Handle sequence no
• Mode information for recreating the lock state
when master changes
Chubby : Architecture
• Reader and writer locks are implemented using Files and
directories.
• Exclusive permission for a lock in the writer mode can be
obtained by a single client, and any number of clients
sharing a lock in reader mode.
• The nature of lock is Advisory and conflict occurs only
when same lock is requested again for acquisition
• implements Event notification mechanism
• Support Consistent caching
• Node meta-data include Access Control Lists

04/21/20 145
Design - Sequencer for lock
• The Status of locks after they acquired can described
using descriptor strings called as sequencer
– introduce sequence numbers into interactions
that use locks
– lock holder requests a sequencer, pass it to file
server to validate

04/21/20 146
Design - Events
• Client subscribes to events when creating handle
• Event types
– file contents modified
– child node added / removed / modified
– Chubby master failed over
– handle / lock have become invalid
– lock acquired / conflicting lock request (rarely used)
• Events are delivered when the action that corresponds to it is
completed
• Chubby is implemented using following API
– Creation of handle using Open() method
– Destruction of handles using Close() method

04/21/20 147
Mobile Cloud Computing
Outline
• Introduction
• Definition
• Architecture
• Benefits
• Challenges
MCC
• MCC is the combination of cloud computing, mobile
computing and wireless networks to bring rich
computational resources to mobile users, network
operators, as well as cloud computing providers.
• NIST defines MCC as “a model for enabling
convenient, on-demand network access to a shared
pool of configurable computing resources (e.g.
network, server, storage, applications and services)
that can be rapidly provisioned and released with
minimal management effort or service provider
interaction.”
Architecture
• Cloud as a large scale distributed systems is based
on various servers that are connected to data
centers.
• Layered Architecture for cloud
• Data Center
• IaaS
• PaaS
• SaaS
Architecture
Benefits
• Extended Lifetime of the battery
• Improved data storage capacity and processing
power
• Improved reliability
• Dynamic provisioning
• Scalability
• Multi-tenancy
• Ease of Integration
Challenges
• Challenges at Mobile End
– Network latency and limited bandwidth
– Service availability
– Heterogeneity of platform, devices and service providers
• Challenges at cloud End
– Computing offload
– Security
– Enhancing the efficiency of Data Access
– Context Aware Mobile cloud Services
Summary
• Definition
• Architecture
• Benefits
• Challenges
AAA Administration for clouds
Outline
• AAA model
• SSO for Clouds
• Authentication management in cloud
• Authorization management in clouds
• Accounting for Resource utilization in cloud
AAA Model
• AAA (a sequence of events when a user logs in)
• Authentication- Security server first checks if the
login name and password are legitimate then user
is authenticated
• Authorization - The user is given access to
modules of application or sets of data that he can
use or view
• Accounting -The server keeps a log or account of
all the resources utilized and the user activities
1. Authentication

• Authentication is to validation of a user’s identity to


permit or reject a login

• It Requires identifier and its corresponding credential


Authentication Management
• Identity management is key building block for
successful use of heterogeneous cloud
• Cloud user applications can authenticate by identity
provider (IdP)
• LDAP-authentication service where identities and
claims are stored and managed for principal using
cloud
• Cloud security Alliance - published a set of guideline
on cloud based identity issues such as provisioning,
authentication, federation and profile management
2. Authorization
• Authorization permits a user to do certain activities
and denies other activities
• AAA server decides whether the user should be
allowed or denied execution of the command
– Data the user can view
– Data the user can edit
– Command the user can run
– Applications the user can start
– Level of access within each application or system
2. Authorization

• This information is kept in Role Based Access


Control(RBAC) database

• Authorization can be also be based on the time of


day, the IP network, the requested QoS, the number
of logged-in users
Authorization Management
• CSP has at least two levels
– Administrator
– User
• Approaches
– Cloud authorization
– Enterprise Authorization
– Cloud Application with Enterprise Policies
3. Accounting of resources
• Accounting does not allow or deny anything.
• It just keeps a log of resources consumption such as
– Identity of user
– Amount of resources used
– Start and end time of use
– Amount of data transferred
– Length of connection
– Purpose of using the resource
– Nature of service delivered
3. Accounting of resources
• Two types of accounting reports
– Real Time Accounting Information
• Delivered concurrently with resource consumption
• Useful for cloud users to track usage and predict the bill, expected
at the end of payment cycle
– Batch Accounting Information
• Delivered at a later time
• Useful for studying utilization trends and capacity
Accounting for Resource Utilization
• The objective of CSP to bill the consumer at the end of each
month based on utilization
• Utilization include
– Amount of time the user is logged in and actively using the application
– H/w resources used such as processing power, memory and storage
space
– Amount of data transferred
– The billing rates could be different based on origin and destination of
data transfer
• Payment per billing period must be based on Reserved
resources or Utilized resources
AAA server
AAA server
SSO for cloud
• Single Sign-On
• a property of access controls for several related but
independent systems
• A user logs in once and gains access to all systems
• Single Sign Off
• reverse where signing out at any application ends access
to all system
• JOSSO
• java based tool for web applications which enables Java
Authentication and Authorization Service to authenticate
users and enforce access control
SSO for cloud
• Benefits
– Access to resources from different CSP using single
authentication
– Reduces phishing attack
– Improves user efficiency ease of access to resources
– Reduces administrative overhead of managing passwords
– Centralizes reporting for better adherence to compliance
• Drawback
– Single point of failure due to n/w link failure
– Grants permission to all resources even though not required
– No custom authentication or access control
Industry Implementation
• AAA server operate on following protocols
• Remote Authentication Dial-In User Service
(RADIUS)
• Diameter - successor to RADIUS
• Terminal Access Controller Access Control System
protocol (TACACS+)
• Kerberos Protocol
SAML
• Security Assertion Markup Language
• an XML-based open-standard data format for exchanging
authentication and authorization data between
parties(between an identity provider and a service
provider.)
• a product of the OASIS Security Services Technical
Committee.
• Following groups involved in SAML
– Identity providers(IdP)
– Service Providers
– Federated Identity
SAML
• Benefits
– Platform Neutral
– Loose coupling of Directories
– Better End-user Experience
– Reduced administrative overhead
– Localizes Authentication to IdP
Summary
• AAA model
• SSO for Clouds
• Authentication management in cloud
• Authorization management in clouds
• Accounting for Resource utilization.
THANK YOU
Programming Support for Amazon
(AWS)

04/21/20 179
AWS
• AWS is a secure cloud services platform, offering
compute power, database storage, content delivery and
other functionality to help businesses scale and grow.
• solutions to build sophisticated applications with
increased flexibility, scalability and reliability.
• The AWS Cloud provides a broad set of infrastructure
services, such as computing power, storage options,
networking and databases, delivered as a utility: on-
demand, available in seconds, with pay-as-you-go
pricing.
Advantages
• Trade capital expense for variable expense
• Benefit from massive economies of scale
• Stop guessing capacity
• Increase speed and agility
• Stop spending money on running and
maintaining data centers
• Go global in minutes
AWS
• Amazon operates at least 30 data centers in its global
network, with another 10 to 15 on the drawing board.
• Amazon doesn’t disclose the full scope of its
infrastructure, but third-party estimates peg its U.S.
data center network at about 600 megawatts of IT
capacity.(<100,000 servers per data center)
• Figuring out the upper end of the range is more
difficult, but could range as high as 5.6 million,
according to calculations by Timothy Prickett Morgan at
the Platform.
AWS data centers
Cloud Solutions
• Solutions- Mobile Services, Websites, Backup
and Recovery
• Hundreds of thousands of customers have joined
the Amazon Web Services (AWS) community and
use AWS solutions to build their businesses.
• The AWS cloud computing platform provides the
flexibility to build your application, regardless of
your use case or industry.
• You can save time, money, and let AWS manage
your infrastructure, without compromising
scalability, security, or dependability.
Cloud Products & Services
• Amazon Web Services (AWS) offers a broad
set of global compute, storage, database,
analytics, application, and deployment
services that help organizations move faster,
lower IT costs, and scale applications.
• Eg EC2, S3, RDS, Aurora
Amazon EC2
• Amazon Elastic Compute Cloud (Amazon EC2)
is a web service that provides secure, resizable
compute capacity in the cloud. It is designed
to make web-scale cloud computing easier for
developers.
Amazon S3
• Amazon Simple Storage Service (Amazon S3) is
object storage with a simple web service
interface to store and retrieve any amount of
data from anywhere on the web.
Amazon RDS
• Amazon Relational Database Service makes it
easy to set up, operate, and scale a relational
database in the cloud.
AWS Lambda
• AWS Lambda lets you run code without
provisioning or managing servers. You pay
only for the compute time you consume -
there is no charge when your code is not
running.
Amazon QuickSight
• Amazon QuickSight is a fast, cloud-powered
business analytics service that makes it easy
to build visualizations, perform ad-hoc
analysis, and quickly get business insights
from your data.
AWS IoT
• AWS IoT is a managed cloud platform that lets
connected devices easily and securely interact
with cloud applications and other devices.
Amazon Aurora
• Amazon Aurora is a MySQL and PostgreSQL-
compatible relational database engine that
combines the speed and availability of high-end
commercial databases with the simplicity and
cost-effectiveness of open source databases.
• Amazon Aurora provides up to five times better
performance than MySQL with the security,
availability, and reliability of a commercial
database at one tenth the cost.
Amazon Aurora
Amazon EC2
Amazon EC2
• Amazon Elastic Compute Cloud (Amazon EC2) is a web
service that provides secure, resizable compute capacity in
the cloud. It is designed to make web-scale cloud
computing easier for developers.
• Benefits
– Elastic Web-Scale Computing
– Completely Controlled
– Flexible Cloud Hosting Services
– Integrated
– Reliable
– Secure
– Inexpensive
– Easy to start
Amazon EC2 Pricing

• Amazon EC2 is free to try.


• There are four ways to pay for Amazon EC2
instances: On-Demand, Reserved Instances,
and Spot Instances. we can also pay
for Dedicated Hosts which provide you with
EC2 instance capacity on physical servers
dedicated for your use.
Amazon EC2 On-Demand Pricing
• On-Demand instances let you pay for compute capacity by the
hour with no long-term commitments. This frees you from the
costs and complexities of planning, purchasing, and
maintaining hardware and transforms what are commonly
large fixed costs into much smaller variable costs.

• The pricing below includes the cost to run private and public
AMIs on the specified operating system (“Windows Usage”
prices apply to Windows Server 2003 R2, 2008, 2008 R2,
2012, 2012 R2, and 2016). Amazon also provides you with
additional instances for Amazon EC2 running Microsoft
Windows with SQL Server, Amazon EC2 running SUSE Linux
Enterprise Server, Amazon EC2 running Red Hat Enterprise
Linux and Amazon EC2 running IBM that are priced differently.
Amazon EC2 Spot Instances Pricing
• Spot instances provide you with access to unused Amazon EC2 capacity at steep
discounts relative to On-Demand prices.  The Spot price fluctuates based on the
supply and demand of available unused EC2 capacity.
• When you request Spot instances, you specify the maximum Spot price you are
willing to pay.  Your Spot instance is launched when the Spot price is lower than
the price you specified, and will continue to run until you choose to terminate it or
the Spot price exceeds the maximum price you specified.
• With Spot instances, you will never be charged more than the maximum price you
specified.  While your instance runs, you are charged the Spot price that is in effect
for that period.  If the Spot price exceeds your specified price, your instance will
receive a two-minute notification before it is terminated, and you will not be
charged for the partial hour that your instance has run.
• If you include a duration requirement with your Spot instances request, your
instance will continue to run until you choose to terminate it, or until the specified
duration has ended; your instance will not be terminated due to changes in the
Spot price.
Amazon EC2 Reserved Instances
Pricing
• Reserved Instances provide you with a
significant discount (up to 75%) compared to
On-Demand instance pricing. In addition,
when Reserved Instances are assigned to a
specific Availability Zone, they provide a
capacity reservation, giving you additional
confidence in your ability to launch instances
when you need them.  
Amazon EC2 Dedicated Hosts Pricing
• The price for a Dedicated Host varies by instance family, region, and
payment option. Regardless of the quantity or the size of instances that
you choose to launch on a particular Dedicated Host you pay hourly for
each active Dedicated Host, and you are not billed for instance usage.
• When you allocate a Dedicated Host host for use, you must choose an
instance type configuration for the host. This selection will define the
number of sockets and physical cores per host, the type of instance you
can run on the host, and the number of instances that you can run on
each host.
• After you have allocated a Dedicated Host, you will pay On-Demand
unless you have a Dedicated Host Reservation. A Dedicated Host
Reservation provides you with a discount of up to 70% compared to On-
Demand pricing.
AWS Marketplace(CSB)
• AWS Marketplace provides a new sales
channel for ISVs and Consulting Partners to
sell their solutions to AWS customers. They
make it easy for customers to find, buy,
deploy and manage software solutions,
including SaaS, in a matter of minutes.
Amazon S3
Amazon S3
• Amazon Simple Storage Service is object storage with a simple
web service interface to store and retrieve any amount of data
from anywhere on the web. It is designed to deliver
99.999999999% durability, and scale past trillions of objects
worldwide.
• Customers use S3 as primary storage for cloud-native
applications; as a bulk repository, or "data lake," for analytics;
as a target for backup & recovery and disaster recovery; and
with server less computing.
• It's simple to move large volumes of data into or out of
Amazon S3 with Amazon's cloud data migration options.
• Once data is stored in S3, it can be automatically tiered into
lower cost, longer-term cloud storage classes like S3 Standard
- Infrequent Access and Amazon Glacier for archiving.
Amazon S3
Benefits
•Simple
•Durable
•Scalable
•Secure
•Available
•Low cost
•Simple Data Transfer
•Integrated
•Easy to manage
Amazon S3 Use cases
• Backup & Archiving

• Content Storage & Distribution

• Big Data Analytics

• Static Website Hosting

• Cloud-native Application Data

• Disaster Recovery

• Hybrid Cloud Storage for Bursting, Tiering and Migration


Amazon S3 Pricing model
• Storage Pricing (varies by region)
• Request Pricing
• Amazon S3 Storage Management Pricing
• Data Transfer Pricing
• Amazon S3 Transfer Acceleration Pricing
• Cross-Region Replication Pricing
• CRR is an Amazon S3 feature that automatically replicates data
across AWS regions. With CRR, every object uploaded to an S3
source bucket is automatically replicated to a destination bucket in a
different AWS region that you choose. You pay the Amazon S3
charges for storage, requests, and inter-region data transfer for the
replicated copy of data in addition to the storage charges for the
primary copy. Pricing for the replicated copy of storage is based on
the destination region, while pricing for requests and inter-region
data transfer are based on the source region.
Amazon Relational Database Service
(RDS)
Amazon Relational Database Service
• Amazon RDS makes it easy to set up, operate, and scale a relational
database in the cloud. It provides cost-efficient and resizable capacity
while automating time-consuming administration tasks such as
hardware provisioning, database setup, patching and backups. It
frees you to focus on your applications so you can give them the fast
performance, high availability, security and compatibility they need.
• Amazon RDS is available on several database instance types -
optimized for memory, performance or I/O - and provides you with
six familiar database engines to choose from,
including AmazonAurora,  PostgreSQL, MySQL, MariaDB, Oracle,
and Microsoft SQL Server. You can use the AWS Database Migration
Service to easily migrate or replicate your existing databases to
Amazon RDS
Amazon RDS
Benefits
•Easy to Administer

•Highly Scalable

•Available and Durable


•Fast

•Secure

•Inexpensive
Summary
• Google App Engine
• Google File System
• Big Table
• Chubby
• Amazon EC2
• Amazon S3
• Amazon EBS
• Amazon Simple DB

04/21/20 229
Thank you !

04/21/20 230
04/21/20 231
Amazon EBS
• Amazon Elastic Block Store (Amazon EBS)
provides persistent block storage volumes for
use with Amazon EC2 instances in the AWS
Cloud. Each Amazon EBS volume is
automatically replicated within its Availability
Zone to protect you from component failure,
offering high availability and durability.
EC2
• AWS offers cloud programming support to its cloud
environment EC2 through various systems including
Simple Storage Service (S3 ), Elastic Block Store (EBS),
Simple DB
• Benefits
– Up and down scalability of web services
– Complete control over computing resources
– Flexibility of services for cloud hosting
– Support for other Amazon web services
– High reliability and security
– Cost efficiency

04/21/20 233
Amazon S3
• Amazon Simple Storage Service (Amazon S3), provides
developers and IT teams with secure, durable, highly-
scalable object storage.
• Amazon S3 is easy to use, with a simple web services
interface to store and retrieve any amount of data from
anywhere on the web.
• With Amazon S3, you pay only for the storage you actually
use. There is no minimum fee and no setup cost.
• Amazon S3 provides cost-effective object storage for cloud
applications, content distribution, backup and archiving,
disaster recovery, and big data analytics.

04/21/20 234
Amazon S3
• Design requirements
– Durable- durability of 99.999999999% of objects.
– Low Cost
– Available- 99.99% availability of objects
– Secure- data transfer over SSL and automatic encryption of your data
once it is uploaded.
– Scalable
– Send Event Notifications
– High Performance
– Integrated
– Easy to use

04/21/20 235
Amazon EBS
• Amazon Elastic Block Store (Amazon EBS) provides
persistent block level storage volumes for use with Amazon
EC2 instances in the AWS Cloud. 
• Each Amazon EBS volume is automatically replicated within
its Availability Zone to protect you from component failure,
offering high availability and durability.
• Amazon EBS volumes offer the consistent and low-latency
performance needed to run your workloads.
• With Amazon EBS, you can scale your usage up or down
within minutes – all while paying a low price for only what
you provision.

04/21/20 236
Amazon EBS
• Amazon EBS top patterns
– Storing key-value – data stored on basis of primary key
access resulted into creation of S3
– Storing Data in simple and Structured form
– Storing Data in Blocks
• Benefits
– Reliable, secure , Consistent storage system
– High performance and low-latency performance
– Quickly scale up, easily scale down
– Backup, restore, innovate
04/21/20 237
Amazon EBS
• EBS is utilized in various ways to render benefits
– EBS usage as database
– EBS usage in applications developed for
enterprises
– EBS usage as NoSQL
– EBS usage in development and testing
environments
– EBS usage in continuing businesses
– EBS usage in file workloads
04/21/20 238
Amazon SimpleDB
• Amazon SimpleDB is a fast, scalable system that would
provide fully managed database services
• It is a highly available and flexible non-relational data
store that offloads the work of database administration.
• Developers simply store and query data items via web
services requests and Amazon SimpleDB does the rest.
• Amazon SimpleDB is optimized to provide high
availability and flexibility, with little or no administrative
burden.

04/21/20 239
Amazon SimpleDB
• Amazon SimpleDB creates and manages multiple
geographically distributed replicas of your data
automatically to enable high availability and data durability.
• The service charges only for the resources actually
consumed in storing your data and serving requests. You
can change your data model on the fly, and data is
automatically indexed for you.
• With Amazon SimpleDB, you can focus on application
development without worrying about infrastructure
provisioning, high availability, software maintenance,
schema and index management, or performance tuning.

04/21/20 240
Amazon SimpleDB
Characteristics
– Scalable- Scaling the storage automatically, Provisioning of throughput, Fully
Distributed Architecture without sharing
– Easy Administration
– Flexible
– Fast, Predictable Performance
– Built-in Fault Tolerance
– Schema-less
– Strong consistency, Atomic Counters
– Cost Effective
– Secure
– Integrated Monitoring

04/21/20 241
Q2) Describe Platform as a Service

OR

Q2) Describe Database as aservice

Q3) Illustrate Open Stack Architecture

OR

Q3) Illustrate GFS Architecture

You might also like