HADOOP Internals

HADOOP Internals
Training Contents
Description
Intended Audience
Key Skills
Prerequisites
Instructional Method
Course contents
Mobile: +91 7719882295/ 9730463630

Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com
HADOOP Internals
Course Contents
Hadoop Introduction Day1

MapReduce
Distributing Data with HDFS
Understanding Hadoop I/O Day2
Advanced MapReduce
Writing Map-Reduce Applications
Map-Reduce Internals Day3
Managing Hadoop
Map-Reduce Ecosystem
Mobile: +91 7719882295/ 9730463630

Description:
This training will Introduce attendees to the core concepts of Hadoop. Deep dive
into the critical architecture paths of HDFS, MapReduce and HBase.Teach the basics
of how to effectively write Pig and Hive scripts.Explain how to choose the correct
use cases for Hadoop
Intended Audience:
Engineers, Programmers, Networking specialists, Managers, Executives
Key Skills:
Advanced Map Reduce Concepts & Algorithms

Big Data & Hadoop Ecosystem
Hadoop Best Practices & Tip and

Techniques Importing and exporting data
Hadoop Distributed File System HDFS
To use Map Reduce API and write common algorithms.
Best practices for developing and debugging map reduce programs

The attendees will learn:
Managing and Monitoring Hadoop Cluster
Prerequisites:
The participants should have basic understanding or knowledge of java and linux.
Instructional Method:
This is an instructor led course which provides lecture topics and the practical
application of Hadoop and the underlying technologies. It pictorially presents most
concepts and there is a detailed case study that strings together the technologies,
patterns and design.
HADOOP Internals
Hadoop Introduction
Move computation not data.

Volunteer Computing
Hadoop Releases
Hadoop performance and data scale facts.
The Apache Hadoop Project.
Grid Computing
Hadoop in the context of other data stores.
The Hadoop Ecosystem.
Apache Hadoop and the Hadoop Ecosystem
A Brief History of Hadoop
Hadoop an inside view: MapReduce and HDFS.
What about NoSQL?
RDBMS
Comparison with Other Systems
MapReduce
Constructing the basic template of a MapReduce program

Running a Distributed MapReduce Job
Data FlowCombiner Functions
Java MapReduceScaling Out
Counting things
Analyzing the Data with Hadoop
Map and Reduce
Hadoop Pipes
Adapting for Hadoops API changes
Improving performance with combiners
Hadoop Streaming
Ruby
Python
Streaming in Hadoop
m
Distributing Data with HDFS
Interfaces
Hadoop Filesystems
The Design of HDFS
Using Hadoop Archives
Anatomy of a File Write

Anatomy of a File Read
Coherency Model
The Command-Line Interface
Keeping an HDFS Cluster Balanced

Hadoop Archives
Data Flow
Limitations
Parallel Copying with distcp
Basic Filesystem Operations
The Java Interface
Streaming with key/value pairs

Streaming with Unix commands
Streaming with the Aggregate package
Streaming with scripts
Querying the Filesystem

Reading Data Using the FileSystem API
Directories
Deleting Data
Reading Data from a Hadoop URL
Writing Data
Understanding Hadoop I/O

File-Based Data Structures
MapFile
SequenceFile
Serialization
Compression
Codecs
Using Compression in MapReduce
Compression and Input Splits
Data Integrity
Implementing a Custom Writable

Serialization Frameworks
The Writable Interface
Writable Classes
Avro
ChecksumFileSystem
LocalFileSystem
Data Integrity in HDFS
Advanced MapReduce
Chaining MapReduce jobs
Creating a Bloom filter
What does a Bloom filter do?

Bloom filter in Hadoop version 0.20+
Implementing a Bloom filter
Joining data from different sources
Chaining preprocessing and postprocessing steps

Chaining MapReduce jobs in a sequence
Chaining MapReduce jobs with complex dependency
Reduce-side joining
Replicated joins using DistributedCache
Semijoin: reduce-side join with map-side filtering
Writing Map-Reduce Applications
Hadoop in the Cloud

Cluster Setup and Installation
Hadoop Configuration
YARN Configuration
m
The Configuration API

Running Locally on Test Data
Configuring the Development Environment
Cluster Specs
Tuning
MapReduce Workflows
Monitoring and debugging on a production cluster
Tuning for performance
Benchmarking a Hadoop Cluster
Map-Reduce Internals
Failures
Anatomy of a MapReduce Job Run
Skipping Bad Records

Output Committers
The Task Execution Environment
Speculative Execution
Task JVM Reuse
Job Scheduling
The Reduce Side

The Map Side
Configuration Tuning
Task Execution
Classic MapReduce (MapReduce 1)

YARN (MapReduce 2)
Shuffle and Sort
Failures in YARN
Failures in Classic MapReduce
The Capacity Scheduler

The Fair Scheduler
Managing Hadoop
Setting permissions
m
Enabling trash
Adding DataNodes
Managing NameNode and Secondary NameNode
Designing network layout and rack awareness
Checking systems health
Managing quotas
Setting up parameter values for practical use
Removing DataNodes
Recovering from a failed NameNode
Map-Reduce Features
Counters
Sorting
Side Data Distribution
Map-Reduce Library
Joins
Map-Reduce Ecosystem
Hive
Installing and configuring Hive
HiveQL in details
Example queries
Hive Sum-up
Hbase
Intoduction
Clients
Concepts
Hbase vs RDBMS
Pig
Installing Pig
Running Pig
Learning Pig Latin through Grunt

Managing the Grunt shell
m
Thinking like a Pig
Data flow language

User-defined functions
Data types
Speaking Pig Latin
Execution optimization
Expressions and functions
Relational operators
Data types and schemas
Mobile: +91 7719882295/ 9730463630


HADOOP Internals

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HADOOP Internals

Uploaded by

Copyright:

Available Formats

HADOOP Internals

Mobile: +91 7719882295/ 9730463630

Hadoop Introduction Day1

Mobile: +91 7719882295/ 9730463630

Engineers, Programmers, Networking specialists, Managers, Executives

Advanced Map Reduce Concepts & Algorithms

Hadoop Best Practices & Tip and

Hadoop Distributed File System HDFS

To use Map Reduce API and write common algorithms.

Best practices for developing and debugging map reduce programs

Managing and Monitoring Hadoop Cluster

Move computation not data.

Constructing the basic template of a MapReduce program

Distributing Data with HDFS

Using Hadoop Archives

Anatomy of a File Write

The Command-Line Interface

Keeping an HDFS Cluster Balanced

Parallel Copying with distcp

Basic Filesystem Operations

The Java Interface

Streaming with key/value pairs

Querying the Filesystem

Understanding Hadoop I/O

Implementing a Custom Writable

Creating a Bloom filter

What does a Bloom filter do?

Joining data from different sources

Chaining preprocessing and postprocessing steps

Writing Map-Reduce Applications

Hadoop in the Cloud

The Configuration API

Anatomy of a MapReduce Job Run

Skipping Bad Records

The Reduce Side

Classic MapReduce (MapReduce 1)

Shuffle and Sort

The Capacity Scheduler

Installing and configuring Hive

Learning Pig Latin through Grunt

Thinking like a Pig

Data flow language

Speaking Pig Latin

Mobile: +91 7719882295/ 9730463630

You might also like