Professional Documents
Culture Documents
Big DATA Analytics: C.Ranichandra & N.C.Senthilkumar
Big DATA Analytics: C.Ranichandra & N.C.Senthilkumar
C.RANICHANDRA
&
N.C.SENTHILKUMAR
CRA
NO SQL
2
CRA
Failures
3
In classic MapReduce
Modes- failure of running task, task tracker, job
tracker
Task failure- map or reduce due to run time exception
Task tracker failure-fails by crashing or slow, job
tracker finds by heartbeat and removes from pool, any
job incomplete or in progress is scheduled again to
other tt as the result may be available in the local
system(intermediate keys)
CRA
NO SQL
4
CRA
HBase
5
CRA
Random R/W
6
CRA
HDFS and HBASE
7
HDFS HBASE
Distributed FS Database on HDFS
Provides high latency batch processing Low latency access to single rows
CRA
Storage Mechanism in HBase
8
Column –oriented
Table schema defines only column families , which
are key value pairs
CRA
Hbase and RDBMS
9
HBASE RDBMS
Schema less Schema oriented
Built for wide tables, horizontally Thin and built for small tables, hard to
scalable scale
No transactions-Suitable for OLAP Transactional
Demoralized data Normalized data
Good for semi structured and structured Good for structured
data
CRA
Applications of Hbase
10
CRA
Hbase Architecture
11
CRA
HBase Shell Commands
12
General:
Status
Version
Table_help
Whoami
DDL:
Create
List
Disable
Is_disabled
Enable
Is_enabled
Describe
Alter
Exists
Drop_all –drop tables matching regrex commands
CRA
HBase Shell Commands
13
DML
Put- a cell value
Get- get row or cell
Delete – delete a cell value
Delete all- delete all the cells in a row
Scan- scan and return table value
Count- number of rows in a table
Truncate- disable, drop and recreate a specified table
CRA
DDL
14
CRA
DML Commands
15
Put <'tablename'>,<'rowname'>,<'columnvalue'>,
<'value'>
Scan <‘tablename’>
get 'table name', ‘rowid’, {COLUMN ⇒ ‘column
family:column name ’}
delete <'tablename'>,<'row name'>,<'column name'>
deleteall <'tablename'>, <'rowname'>
truncate <tablename>
CRA
Example
16
CRA
DDL+DML Commands
17
CRA
DDL+DML Commands
18
CRA
Exercise-MBA Admissions
19
CRA
Batch Processing
20
CRA
Batch Processing
21
Batch processing is very efficient in processing high
volume data.
Where data is collected, entered to the system,
processed and then results are produced in batches.
Here time taken for the processing is not an issue.
Batch jobs are configured to run without manual
intervention, trained against entire dataset at scale in
order to produce output in the form of computational
analyses and data files.
Depending on the size of the data being processed
and the computational power of the system, output
CRA
can be delayed significantly.
MapReduce
22
CRA
23
CRA
Phases
24
CRA
Phases
25
CRA
Phases
26
CRA
Word Count Example
27
CRA
Anatomy of MapReduce
29
MapReduce 1 (classic)
MapReduce 2 (YARN)
hadoop>start-dfs.sh
– Starting namenode, datanode, secondary namenode
hadoop>jps
– Jobid Namenode,secondary namenode(m)
– Jobid Datanode (s)
hadoop>start-yarn.sh
– Starting resource manager(m), node manager(c)
hadoop>jps
– Jod id Resource Manager(m),
– Job id Node Manager(s)
CRA
Execute wordcount
30
CRA
Classic MapReduce Framework
31
Four Entities
The client-submit job
The Job Tracker –coordinates the job, a Java
API-JobTracker main class
The task Tracker-run the task that the job has
been split, TaskTracker main class
Distributed file system for sharing files
between entiites
CRA
Job Submission
32
CRA
Task Assignment
34
CRA
Task Execution and Job Completion
35
CRA
CRA 09/07/16
NO SQL
37
CRA
YARN MapReduce 2
38
CRA
Entities
39
Client-job submission
Resource manager- coordinates the allocation of
resources on cluster
Node Manager- monitor machines in cluster
Application Master- coordinates the tasks running
the MapReduce job
HDFS
CRA
40
CRA
Failures
41
In classic MapReduce
Modes- failure of running task, task tracker, job
tracker
Task failure- map or reduce due to run time
exception
Task tracker failure-fails by crashing or slow, job
tracker finds by heartbeat and removes from pool,
any job incomplete or in progress is scheduled again
to other tt as the result may be available in the local
system(intermediate keys)
CRA
42
Task tracker-blacklisted by JT, if more than 4 tasks
from same job fail
Job tracker failure-most serious, single point failure,
Hadoop has no mechanism for dealing with JT
failure
CRA
Failures in YARN
43
Modes- task, application master, node manager,
resource manager
Task- same as classic
Application master failure- applications in YARN are
tried multiple times in the event of failure, Resource
manager will detect the failure and start in new
container
Node Manager failure- node manager sends periodic
heart beat to resource manager, so RM will detect the
failure and remove from list
CRA
44
Node manager-will be black listed , if the failures of
application is high
Resource Manager Failure-serious, recover from
crashes by using check point mechanism
CRA
Job Scheduling
45