Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

Case Studies and Advancements

Unit: 5

CLOUD
COMPUTING (TCS 074)
Mr. Saurabh Gupta
Department of CSE
Course Details
(B.Tech 7th Sem)

04/04/2024 MR SAURABH GUPTA UNIT -5 1


Unit Content

– Federation in the Cloud


• Cloud Hadoop
– Four Levels of Federation
• Map Reduce
– Federated Services and
Applications
• Virtual Box
– Future of Federation

• Google App Engine

• Programming Environment for


Google App Engine

• Open Stack

04/04/2024 Mr Saurabh Gupta Unit -5 2


Cloud Hadoop

•The most well known technology used for Big Data is


Hadoop.

•It is actually a large scale batch data processing


system using simple programming

• Hadoop software library is a framework

• It is made by apache software foundation in 2011.


and written in java

04/04/2024 Mr Saurabh Gupta Unit -5 3


Hadoop

• Hadoop is open source software.

• Framework

• Massive Storage

• Processing Power

04/04/2024 Mr Saurabh Gupta Unit -5 4


Why Hadoop

• Distributed cluster system

• Platform for massively scalable applications

• Enables parallel data processing

04/04/2024 Mr Saurabh Gupta Unit -5 5


Big data

• Big data is a term used to define very large amount of unstructured and semi
structured data a company creates.

• when talking about Petabytes and Exabyte of data. •

• That much data would take so much time and cost to load into relational
database for analysis.

• Facebook has almost 10billion photos taking up to 1Petabytes of storage.

04/04/2024 Mr Saurabh Gupta Unit -5 6


So what is the problem

• Processing that large data is very difficult in relational database.

• It would take too much time to process data and cost

04/04/2024 Mr Saurabh Gupta Unit -5 7


problems in distributed computing

• Chances of hardware failure is always there.

• Combine the data after analysis

• Data from all disks have to be combined from all the disks which is a mess.

04/04/2024 Mr Saurabh Gupta Unit -5 8


Hadoop parts

• To Solve all the Problems Hadoop Came.

• It has two main parts

• – Hadoop Distributed File System (HDFS)

• - Data Processing Framework & Map Reduce

04/04/2024 Mr Saurabh Gupta Unit -5 9


Hadoop Distributed File System

• It ties so many small and reasonable priced machines together into a single cost
effective computer cluster.

• Data and application processing are protected against hardware failure.

• If a node goes down, jobs are automatically redirected to other nodes to make sure the
distributed computing does not fail.

• it automatically stores multiple copies of all data.

• It provides simplified programming model which allows user to quickly read and write
the distributed system.

04/04/2024 Mr Saurabh Gupta Unit -5 10


Map Reduce

• Map Reduce is a programming model for processing and generating large data
sets with a parallel, distributed algorithm on a cluster.

• It is an associative implementation for processing and generating large data


sets.

• MAP function that process a key pair to generates a set of intermediate key
pairs.

• REDUCE function that merges all intermediate values associated with the same
intermediate key

04/04/2024 Mr Saurabh Gupta Unit -5 11


Map Reduce

• Map Reduce: programming model developed at Google

• Objective: Implement large scale search

• Text processing on massively scalable web data stored using Big Table and GFS
distributed file system

• Designed for processing and generating large volumes of data via massively
parallel computations, utilizing tens of thousands of processors at a time

04/04/2024 Mr Saurabh Gupta Unit -5 12


Hadoop Advantages

• Computing power

• Flexibility

• Fault Tolerance

• Low Cost

• Scalability

04/04/2024 Mr Saurabh Gupta Unit -5 13


Hadoop Disadvantages

• Integration with existing systems Hadoop is not optimized for ease for use.

• Installing and integrating with existing databases might prove to be difficult,


especially since there is no software support provided.

• Administration and ease of use Hadoop requires knowledge of Map Reduce,


while most data practitioners use SQL. This means significant training required
Hadoop clusters.

• Security Hadoop lacks the level of security functionality

04/04/2024 Mr Saurabh Gupta Unit -5 14


Map Reduce

• Two phases of Map Reduce: – Map operation – Reduce operation

• Map phase: – Each mapper reads approximately 1/M of the input from the
global file system

• using locations given by the master – Map operation consists of transforming


one set of key-value pairs to another:

• – Each mapper writes computation results in one file per reducer


• – Files are sorted by a key and stored to the local file system
• – The master keeps track of the location of these files

04/04/2024 Mr Saurabh Gupta Unit -5 15


Map Reduce

• Reduce phase: – The master informs the reducers where the partial computations have
been stored on local files of respective mappers

• – Reducers make remote procedure call requests to the mappers to fetch the files

• – Each reducer groups the results of the map step using the same key and performs a
function f on the list of values that correspond to these key value:

• – Final results are written back to the GFS file system

04/04/2024 Mr Saurabh Gupta Unit -5 16


Daily Quiz

1. _______ maps input key/value pairs to a set of intermediate key/value pairs.


a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned

2. Although the Hadoop framework is implemented in Java, MapReduce applications


need not be written in ____________
a) Java
b) C
c) C#

3. Running a ___________ program involves running mapping tasks on many or all of


the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned

04/04/2024 Mr Saurabh Gupta Unit -5 17


Virtual box

• Virtual Box is open-source software for virtualizing the x86 computing architecture.

• It acts as a hypervisor, creating a VM (virtual machine) in which the user can run
another OS (operating system).

• The operating system in which Virtual Box runs is called the "host" OS.

• Virtual Box supports Windows, Linux, or macOS as its host OS.

04/04/2024 Mr Saurabh Gupta Unit -5 18


Virtual Box

• Virtual Box was originally developed by Innotek GmbH

• released on January 17, 2007 as an open-source software package.

• The company was later purchased by Sun Microsystems.

• On January 27, 2010, Oracle Corporation purchased Sun, and took over development of
Virtual Box.

04/04/2024 Mr Saurabh Gupta Unit -5 19


Google App Engine

What IS App Engine

Google App Engine

•Google’s Platform to Build Web Application on Cloud

•Dynamic Web server with full support for common web technologies

•Automatic Scaling & Load balancing

•Transactional Data store model

04/04/2024 Mr Saurabh Gupta Unit -5 20


Google App Engine

• Google’s Platform to build Web applications on the cloud

• Dynamic Web Server, with full support to common web technologies

• Automatic scaling and local balancing

• SQL and NoSQL Data Store Model

• Integration with Google Account through APIs

04/04/2024 Mr Saurabh Gupta Unit -5 21


Why Google app engine

• Auto Scaling - No need to over provision Affordable Scaling - Prices better


than AWS
• Static Files - Static files use Google’s CDN
• No config - No need to con-
• Easy Logs - View logs in web console fig OS or servers

• Easy Deployment - Literally 1-click deploy • Easy Security - Google


patches OS/servers

• Free Quota - 99% of apps will pay nothing

04/04/2024 Mr Saurabh Gupta Unit -5 22


Language support

• Python v2.5, v2.7

• Java 5, Java 6

• Go

04/04/2024 Mr Saurabh Gupta Unit -5 23


Google app engine

Advantages
• Infrastructure for Security Disadvantages
• You Are At Google’s Mercy
• Scalability
• Violation of Policies
• Performance and Reliability
• Forget Porting
• Cost Savings
• It isn’t Free
• Platform Independence

04/04/2024 Mr Saurabh Gupta Unit -5 24


Open Stack

• Open Stack is a cloud operating system that controls large pools of compute, storage,
and networking resources throughout a datacenter

• all managed through a dashboard that gives administrators control while empowering
their users to provision resources through a web interface.

04/04/2024 Mr Saurabh Gupta Unit -5 25


Open stack capability

Software as Service (SaaS)


▪ Provisioning
▪ Browser or Thin Client access
▪ Snapshotting

▪ Platform as Service (PaaS) ▪ Network

▪ On top of IaaS e.g. Cloud Foundry ▪ Storage for VMs and arbitrary files

▪ Infrastructure as Service (IaaS) ▪ Multi-tenancy


▪ User can be associated with multiple
projects
▪ Provision Compute, Network, Storage

▪ Virtual Machine (VMs) on demand

04/04/2024 Mr Saurabh Gupta Unit -5 26


Component

• Service - Compute ▪ Project – Nova

• Service - Networking ▪ Project – Neutron

• Service - Object storage ▪ Project - Swift

• Service- Block storage ▪ Project- Cinder

• Service - Telemetry ▪ Project – Ceilometer

• Service - Dashboard ▪ Project - Horizon

04/04/2024 Mr Saurabh Gupta Unit -5 27


Federation

• Cloud federation is the practice of interconnecting the cloud computing environments


of two or more service providers

• The purpose of load balancing traffic and accommodating spikes in demand.

• Cloud federation requires one provider to wholesale or rent computing resources to


another cloud provider.

04/04/2024 Mr Saurabh Gupta Unit -5 28


Four levels of Federation

The conceptual level addresses the challenges in presenting a cloud federation as


a favorable solution with respect to the use of services leased by single cloud
providers.

The logical and operational level of a federated cloud identifies and addresses the
challenges in devising a framework that enables the aggregation of providers that
belong to different administrative domains within a context of a single overlay
infrastructure, which is the cloud federation.

The infrastructural level addresses the technical challenges involved in enabling


heterogeneous cloud computing systems to interoperate seamlessly

04/04/2024 Mr Saurabh Gupta Unit -5 29


Federated Services and Applications

• The federation of cloud resources allows clients to optimize enterprise IT


service delivery.

• Federation across different cloud resource pools allows applications to run in


the most appropriate infrastructure environments.

04/04/2024 Mr Saurabh Gupta Unit -5 30


Benefits

• The federation of cloud resources allows clients to optimize enterprise IT service


delivery.

• The federation of cloud resources allows a client to choose the best cloud services
provider, in terms of flexibility,

• cost and availability of services, to meet a particular business or technological need


within their organization.

04/04/2024 Mr Saurabh Gupta Unit -5 31


Benefits

• Federation across different cloud resource pools allows applications to run in the most
appropriate infrastructure environments.

• The federation of cloud resources also allows an enterprise to distribute workloads


around the globe,

• move data between disparate networks and implement innovative security models for
user access to cloud resources.

04/04/2024 Mr Saurabh Gupta Unit -5 32


Future of Federation

• The federated cloud model is a force for real democratization in the cloud market.

• It’s how businesses will be able to use local cloud providers to connect with customers,
partners and employees anywhere in the world.

04/04/2024 Mr Saurabh Gupta Unit -5 33


Future of Federation

• It’s how end users will finally get to realize the promise of the cloud.

• And, it’s how data center operators and other service providers will finally be able to
compete with, and beat, today’s so-called global cloud providers

04/04/2024 Mr Saurabh Gupta Unit -5 34


Cloud Data Life Cycle
• There are seven phases of data life cycle are
1.Generation
2.Use
3.Transfer
4.Transformation
5.Storage
6.Archival
7.Destruction

04/04/2024 Mr Saurabh Gupta Unit -5 35


Cloud Data Life Cycle
• Generation of the Information
• Ownership: Who in the organization owns the user’s data, and how is the
ownership of data maintained within the organization?
• Classification: How and when is personally identifiable information classified?
Are there any limitations on cloud computing on specific data cases?
• Governance: To ensure that personally identifiable information is managed and
protected throughout its life-cycle

04/04/2024 Mr Saurabh Gupta Unit -5 36


Cloud Data Life Cycle
Use of the Information
• Internal v/s External: Are personally identifiable
information used only inside the organization or they are
used outside the organization?
• Third Party: Is the personally identifiable information
shared with third parties(organizations besides the parent
company having data).
• Appropriateness: Is the personally identifiable information
of users being correctly used for which it is intended?
• Discovery/Subpoena: Is the information stored in the cloud
will enable the organization to comply with legal
requirements in legal proceedings?
04/04/2024 Mr Saurabh Gupta Unit -5 37
Cloud Data Life Cycle
• Transfer of the Data
• Public v/s Private Network: Are the public networks
secure(protected) enough while the personally identifiable
information is transferred to the cloud?
• Encryption Requirements: Is the personally identifiable
information encrypted while transmitted via a public network?
• Access Control: Appropriate access control measures should be
taken on personally identifiable information when it is in the
cloud.
04/04/2024 Mr Saurabh Gupta Unit -5 38
Cloud Data Life Cycle
• Transformation of Data
• Derivation:- While data is being transformed in the cloud, it should
be protected and user limitations should be imposed on it.
• Aggregations:- The data should be aggregated so that we can ensure
that it is no longer identifying any personal individual.
• Integrity:- Is the integrity of personally identifiable information
maintained while it is in the cloud?

04/04/2024 Mr Saurabh Gupta Unit -5 39


Cloud Data Life Cycle
• Storage of Data
• Access Control: Appropriate access controls should be used on
personally identifiable information while it is stored in the cloud so
that only individuals with a need to know will be able to access it.
• Structured v/s Unstructured: How the stored data will enable the
organizations in accessing and managing the data in the future.
• Integrity/Availability/Confidentiality: How data integrity,
availability, and confidentiality are maintained in the cloud?
• Encryption: The personally identifiable information should be
encrypted while it is in the cloud.
04/04/2024 Mr Saurabh Gupta Unit -5 40
Cloud Data Life Cycle
• Archival
• Legal and Compliance: Personally identifiable information should have
specific requirements that will instruct how long the data should be stored
and archived.
• Off-site Considerations: Does the cloud service provider have the ability
for long-term off-site storage and should also support the archival
requirement?
• Media Concerns: Who will control the media and what is the
organization’s ability to recover in such cases when the media is lost?
• Retention: For how long the data should be retained on the cloud by the
cloud service providers?
04/04/2024 Mr Saurabh Gupta Unit -5 41
Cloud Data Life Cycle
• Destruction of the Data
• Secure: Does the cloud service providers destroy the personally
identifiable information obtained by the customers to avoid a breach
of information?
• Complete: Does the personally identifiable information be completely
destroyed? (erase the data, or it can be recovered)

04/04/2024 Mr Saurabh Gupta Unit -5 42


Authentication and Authorization
1. The authentication credentials can be 1. The authorization permissions cannot
changed in part as and when required by be changed by user as these are granted
the user. by the owner of the system and only
he/she has the access to change it.
2. The user authentication is visible at 2. The user authorization is not visible at
user end. the user end.
3. The user authentication is identified 3. The user authorization is carried out
with username, password, face through the access rights to resources
recognition, retina scan, fingerprints, by using roles that have been pre-
etc. defined.
4. In the authentication process, users or 4. While in this process, users or persons
persons are verified. are validated.
5. It is done before the authorization 6. While this process is done after the
process. authentication process.
7. It needs usually the user’s login 7. While it needs the user’s privilege or
details. security levels.
8. Authentication determines whether 8. While it determines What permission
the person is user or not. does the user have?

04/04/2024 Mr Saurabh Gupta Unit -5 43


Authentication and Authorization
Generally, transmit information through an Generally, transmit information through an
ID Token. Access Token.
The OpenID Connect (OIDC) protocol is an The OAuth 2.0 protocol governs the overall
authentication protocol that is generally in system of user authorization process.
charge of user authentication process.
Popular Authentication Techniques: Popular Authorization Techniques-
Password-Based Authentication Role-Based Access Controls (RBAC)
Password less Authentication JSON Web Token Authorization
2FA/MFA (Two-Factor Authentication / SAML Authorization
Multi-Factor Authentication) OpenID Authorization
Single sign on(SSO) OAuth 2.0 Authorization
Social authentication
The authentication credentials can be The authorization permissions cannot be
changed in part as and when required by changed by user as these are granted by the
the user. owner of the system and only he/she has
the access to change it.
Example: Employees in a company are Example: After an employee successfully
required to authenticate through the authenticates, the system determines what
network before accessing their company information the employees are allowed to
email. access.

04/04/2024 Mr Saurabh Gupta Unit -5 44


Multi-Factor Authentication
• Multifactor authentication (MFA) is an account login process that requires multiple
methods of authentication from independent categories of credentials to verify a user's
identity for a login or other transaction. Multifactor authentication combines two or more
independent credentials -- what the user knows, such as a password; what the user has,
such as a security token; and what the user is, by using biometric verification methods.
• Multifactor authentication is a core component of an Identity and access management
framework.
• The goal of MFA is to create a layered defense that makes it more difficult for an
unauthorized person to access a target, such as a physical location, computing device,
network or database. If one factor is compromised or broken, the attacker still has at least
one or more barriers to breach before successfully breaking into the target.

04/04/2024 Mr Saurabh Gupta Unit -5 45


Security policy management
• Security Policy Management is the process of identifying, implementing, and managing the rules
and procedures that all individuals must follow when accessing and using an organization’s IT
assets and resources.
• The goal of these network security policies is to address security threats and implement strategies
to mitigate IT security vulnerabilities, as well as defining how to recover from a system
compromise or when a network intrusion occurs.
• Furthermore, the policies provide guidelines to employees on what to do and what not to do. They
also define who gets access to what assets and resources, and what the consequences are for not
following the rules.
• it’s important for every organization to have documented IT Security Policies and Security Policy
Management to help protect the organization’s data and other valuable assets.

04/04/2024 Mr Saurabh Gupta Unit -5 46


Role based access controls
• Role-based access control (RBAC), also known as role-based security, is a mechanism that
restricts system access. It involves setting permissions and privileges to enable access to
authorized users. Most large organizations use role-based access control to provide their
employees with varying levels of access based on their roles and responsibilities. This protects
sensitive data and ensures employees can only access information and perform actions they need
to do their jobs.
• An organization may let some individuals create or modify files while providing others with
viewing permission only.

04/04/2024 Mr Saurabh Gupta Unit -5 47


Monitoring and Auditing
• Monitoring is to ensure that policies and procedures are in place and are being followed, Auditing
is to determine whether the monitoring program is operating as it should and that policies,
procedures, and controls adopted are adequate and their effectiveness is validated in reducing
errors and risks.
• Auditing is focused on compliance.
• Monitoring measures compliance and success, and when necessary, offers a
roadmap for improvement.

04/04/2024 Mr Saurabh Gupta Unit -5 48


Weekly Assignment

QUESTIONS

Q.1 Illustrate use of hadoop.

Q.2 Open stack is used to deploy IaaS. Elaborate.

Q.3 Describe virtual Box and its Working.


Q.4 Where map reduce is required and how its work

Q.5 List out Four level of federation.


.

04/04/2024 Mr Saurabh Gupta Unit -5 49


MCQs

1. What was Hadoop written in?


a) Java (software platform)
b) Perl
c) Java (programming language)
d) Lua (programming language)

2. Which of the following platforms does Hadoop run on?


a) Bare metal
b) Debi an
c) Cross-platform
d) Unix-like

3. Above the file systems comes the ________ engine, which consists of one Job Tracker, to
which client applications submit Map Reduce jobs.
a) Map Reduce
b) Google
c) Functional programming
d) Facebook

04/04/2024 Mr Saurabh Gupta Unit -5 50


MCQ

1. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node b) Name Node
c) Data block d) Replication

2. HDFS works in a __________ fashion.


a) master-worker b) master-slave
c) worker/slave d) all of the mentioned

3. Point out the correct statement.


a) Hadoop do need specialized hardware to process the data
b) Hadoop 2.0 allows live stream processing of real-time data
c) In Hadoop programming framework output files are divided into lines or records
d) None of the mentioned

04/04/2024 Mr Saurabh Gupta Unit -5 51


MCQ

1. __________ can best be described as a programming model used to develop Hadoop-


based applications that can process massive amounts of data.
a) Map Reduce b) Mahout
c) Oozie d) All of the mentioned
2. Facebook Tackles Big Data With _______ based on Hadoop.
a) ‘Project Prism’ b) ‘Prism’
c) ‘Project Big’ d) ‘Project Data
3. __________ has the world’s largest Hadoop cluster.
a) Apple
b) Datamatics
c) Facebook
d) None of the mentioned

04/04/2024 Mr Saurabh Gupta Unit -5 52


MCQ

1. Which component serves as a dashboard for users to manage OpenStack compute,


storage and networking services?
a)Designate b)Horizon c)Glance d)Searchlight

2. Swift is Open Stack's object storage system, while Cinder deals with block storage.
a)True b)False

3. What is Google App Engine for?


A. Google App Engine is for detecting malicious apps.
B. Google App Engine is for running web applications on Google’s infrastructure.
C. Google App Engine replaces the modern computer.
D. Google App Engine is a system to develop hardware interfaces.

04/04/2024 Mr Saurabh Gupta Unit -5 53


Assignment Questions

• Define Hadoop technology and importance of Hadoop.

• Identify the use of map reduce and phases of map reduce.


Describe security policy management.
What is Role based access control.

• Describe virtual machine security.


• What is Multi factor authentication.

• Write short note on security governance and identity access management.

• Summarize software as a service security and policies of SaaS


• Describe Cloud data life cycle and explain its phases.
• Differentiate between Authorization and authentication.

04/04/2024 Mr Saurabh Gupta Unit -5 54

You might also like