Professional Documents
Culture Documents
HADOOP Internals
HADOOP Internals
Training Contents
Description
Intended Audience
Key Skills
Prerequisites
Instructional Method
Course contents
HADOOP Internals
Course Contents
Description:
This training will Introduce attendees to the core concepts of Hadoop. Deep dive
into the critical architecture paths of HDFS, MapReduce and HBase.Teach the basics
of how to effectively write Pig and Hive scripts.Explain how to choose the correct
use cases for Hadoop
Intended Audience:
Key Skills:
Prerequisites:
The participants should have basic understanding or knowledge of java and linux.
Instructional Method:
This is an instructor led course which provides lecture topics and the practical
application of Hadoop and the underlying technologies. It pictorially presents most
concepts and there is a detailed case study that strings together the technologies,
patterns and design.
HADOOP Internals
Hadoop Introduction
MapReduce
Hadoop Streaming
Ruby
Python
Streaming in Hadoop
m
Interfaces
Hadoop Filesystems
The Design of HDFS
Data Flow
Limitations
MapFile
SequenceFile
Serialization
Compression
Codecs
Using Compression in MapReduce
Compression and Input Splits
Data Integrity
ChecksumFileSystem
LocalFileSystem
Data Integrity in HDFS
Advanced MapReduce
Chaining MapReduce jobs
Reduce-side joining
Replicated joins using DistributedCache
Semijoin: reduce-side join with map-side filtering
Map-Reduce Internals
Failures
Job Scheduling
Task Execution
Failures in YARN
Failures in Classic MapReduce
Managing Hadoop
Setting permissions
m
Enabling trash
Adding DataNodes
Managing NameNode and Secondary NameNode
Designing network layout and rack awareness
Checking systems health
Managing quotas
Setting up parameter values for practical use
Removing DataNodes
Recovering from a failed NameNode
Map-Reduce Features
Counters
Sorting
Side Data Distribution
Map-Reduce Library
Joins
Map-Reduce Ecosystem
Hive
HiveQL in details
Example queries
Hive Sum-up
Hbase
Intoduction
Clients
Concepts
Hbase vs RDBMS
Pig
Installing Pig
Running Pig
Execution optimization
Expressions and functions
Relational operators
Data types and schemas