Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Inspire…Educate…Transform.

Engineering Big Data


With Hadoop & Spark Ecosystem
YARN (contd.)
Parallel Processing Platforms

Dr. Sreerama K. Murthy


CEO, Quadratyx; Mentor, INSOFE
05 May 2018

The best place for students to learn Applied Engineering http://www.insofe.edu.in


Our Focus: The open source big data (“Hadoop + Spark”) ecosystem

AVRO Zoo Machine Learning on Hadoop Spark-ML, Mahout, Samsara, H20, Flink, R-Hadoop
Keeper SECURITY
S & KAFKA, SAMZA, STORM, TRIDENT,
E C QOS
Streaming & Near Real Time Processing
SPARK-STREAMING, FLINK
R O
I O KNOX
Ranger Application Programming PIG, Oozie, Hadoop Streaming,
A R R-Hadoop, Spark-R
Sentry
L D Atlas Data Organization
I I Kerberos SQL / No SQL HIVE, IMPALA, SQL on SPARK, Apache Drill
Z N PRIVACY ----------------------------------------
A A Parallel Computing Flink Hbase, Cassandra, MongoDB, Neo4J, Kudu
T T AUDIT Map-Reduce, MR2, Spark, Hama
I I
O O Resource Management (OS) YARN
GOVERNANCE
N N
HDFS STORAGE (Persistence)

INGESTION
Sqoop, Flume, Chukwa

The best place for students to learn Applied Engineering 2 http://www.insofe.edu.in


https://www.youtube.com/watch?v=Pu9qgnebCjs

YARN REFRESHER

3
The best place for students to learn Applied Engineering 3 http://www.insofe.edu.in
The best place for students to learn Applied Engineering 4 http://www.insofe.edu.in
YARN makes Hadoop multi-tenant

The best place for students to learn Applied Engineering 5 http://www.insofe.edu.in


A YARN CLUSTER

http://blog.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/
The best place for students to learn Applied Engineering 6 http://www.insofe.edu.in
Role of a Resource Manager

The best place for students to learn Applied Engineering 7 http://www.insofe.edu.in


Role of a Node Manager

The best place for students to learn Applied Engineering 8 http://www.insofe.edu.in


YARN daemons (contd.)

The best place for students to learn Applied Engineering 9 http://www.insofe.edu.in


YARN supports the notion
of resource reservation to
reserve resources to ensure the
predictable execution of
important jobs.

The ReservationSystem is a
YARN component of YARN that allows
users to specify a profile of
Supports resources over-time and temporal
constraints (e.g., deadlines).
Reservation
The ReservationSystem tracks
resources over-time, performs
admission control for reservations,
and dynamically instructs the
underlying scheduler to ensure
that the reservation is fulfilled.

The best place for students to learn Applied Engineering 10 http://www.insofe.edu.in


Yarn Federation

Feature helps scale YARN beyond few


thousands nodes.

Federation allows to transparently wire


together multiple yarn (sub-)clusters, and
make them appear as a single massive
cluster.

This can be used to achieve larger scale,


and/or to allow multiple independent
clusters to be used together for very large
jobs, or for tenants who have capacity
across all of them.

The best place for students to learn Applied Engineering 11 http://www.insofe.edu.in


Hadoop Evolution & YARN (contd.)

The best place for students to learn Applied Engineering 12 http://www.insofe.edu.in


PARALLELIZATION PLATFORMS

The best place for students to learn Applied Engineering 13 http://www.insofe.edu.in


What We Want

User easily writes Parallelization Platform


intuitive instructions...

Which get auto-converted into And gets reliably executed


very efficient parallel programs... on a Hadoop cluster...

The best place for students to learn Applied Engineering 14 http://www.insofe.edu.in


Other Scenarios

Parallelization
Platform
Existing software tools Run in parallel &
or applications… more efficiently on Hadoop.

In-built Parallelism
Built on Hadoop

The best place for students to learn Applied Engineering 15 http://www.insofe.edu.in


Option 1: Map-Reduce
Split Job Job

T1 T2 T3 Tasks

Iterate “worker” “worker” “worker”

r1 r2 r3 Partial Results

Shuffle, Sort, Aggregate


Combine Results

Sanjay Ghemawat Jeffrey Dean


Tensor Flow, Spanner,
Google News, Ad Sense,
Big Table, GFS, …

The best place for students to learn Applied Engineering 16 http://www.insofe.edu.in


Option 2: Bulk Synchronous Parallel (BSP)

Leslie Valiant (1990)


The best place for students to learn Applied Engineering 17 http://www.insofe.edu.in
Parallel Platforms Option 3: Spark

Matei Zaharia
The best place for students to learn Applied Engineering 18 http://www.insofe.edu.in
The best place for students to learn Applied Engineering 19 http://www.insofe.edu.in
Kostas Dzoumas Stephen Ewen

The best place for students to learn Applied Engineering 20 http://www.insofe.edu.in


Hadoop Processing Frameworks - Summary

Table API

Many Hadoop eco-system components employ these frameworks. Even more employ the ideas.

Platforms
Apache Hama (2012) 2014
2014
2005 for MR, 2010 for MR2

Frameworks
Pregel: A System for
Large Scale Graph
Processing

General-purpose
General-purpose 2010 Implementation on HDFS
Implementation on HDFS

Abstractions
RDDs: A fault tolerant
Distributed
BSP: “Bulk Synchronous Map Reduce: “Simplified Data abstraction for in-
Streaming
Parallel processing” Processing on Large Clusters” memory cluster
Data Flows
computing
1990 – Les Valiant 2004: Jeffrey Dean, Sanjay Ghemawat 2012: Zaharia,
Choudhary, Das, etc.

The best place for students to learn Applied Engineering 21 http://www.insofe.edu.in


Details of Parallel Processing Engines

MAP REDUCE & MR2

The best place for students to learn Applied Engineering 22 http://www.insofe.edu.in


Map Reduce
Map-Reduce: Solving large data problems. 2004

The best place for students to learn Applied Engineering 23 http://www.insofe.edu.in


The Hello World of MapReduce

The best place for students to learn Applied Engineering 24 http://www.insofe.edu.in


Hello World - continued

The best place for students to learn Applied Engineering 25 http://www.insofe.edu.in


The best place for students to learn Applied Engineering 26 http://www.insofe.edu.in
MR Keys and Values

The best place for students to learn Applied Engineering 27 http://www.insofe.edu.in


Keys and Values (contd.)

• Programmers specify two functions:


map (k, v) → <k’, v’>*
reduce (k’, v’) → <k’, v’>*
– All values with the same key are reduced
together

The best place for students to learn Applied Engineering 28 http://www.insofe.edu.in


MR2 Shuffle on YARN

The best place for students to learn Applied Engineering 29 http://www.insofe.edu.in


MR Usage at Google

The best place for students to learn Applied Engineering 30 http://www.insofe.edu.in


Map
Reduce
Stages

The best place for students to learn Applied Engineering 31 http://www.insofe.edu.in


MR Workflow

 1:many

Mini Reducer

Default – Hash Partitioner

The best place for students to learn Applied Engineering 32 http://www.insofe.edu.in


InputFormat

Same with Output


Record Reader Format & Record
Record Reader
Writers

The best place for students to learn Applied Engineering 33 http://www.insofe.edu.in


Other PPP Concepts

• Map Reduce
– MR1: Job Tracker, Task Tracker
– Distributed Cache
– MR Design Patterns
• BSP
– Vote to Halt
– Think like a vertex

The best place for students to learn Applied Engineering 34 http://www.insofe.edu.in


HYDERABAD BENGALURU
Office and Classrooms Office
Plot 63/A, Floors 1&2, Road # 13, Film Nagar, Incubex, #728, Grace Platina, 4th Floor, CMH Road,
Jubilee Hills, Hyderabad - 500 033 Indira Nagar, 1st Stage, Bengaluru – 560038
+91-9701685511 (Individuals) +91-9502334561 (Individuals)
+91-9618483483 (Corporates) +91-9502799088 (Corporates)

Social Media Classroom


Web: http://www.insofe.edu.in KnowledgeHut Solutions Pvt. Ltd., Reliable Plaza,
Facebook: https://www.facebook.com/insofe
Jakkasandra Main Road, Teacher's Colony, 14th Main
Road, Sector – 5, HSR Layout, Bengaluru - 560102
Twitter: https://twitter.com/Insofeedu
YouTube: http://www.youtube.com/InsofeVideos
SlideShare: http://www.slideshare.net/INSOFE
LinkedIn: http://www.linkedin.com/company/international-school-of-engineering
This presentation may contain references to findings of various reports available in the public domain. INSOFE makes no representation as to their accuracy or that the organization
subscribes to those findings.

The best place for students to learn Applied Engineering 35 http://www.insofe.edu.in

You might also like