Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

HADOOP Online Training Course Content

1) Introduction

Problems with traditional large-scale systems
Requirements for a new approach
Introduction to Existing Systems
Challenges in Traditional Databases

2) WELCOME to Hadoop

Introduction to Hadoop?
RDBMS Comparison with Hadoop
Motivation for Hadoop
Hadoop Terminology

3) Hadoop Eco-Systems

Namenode
Datanode
Secondary Namenode
Job Tracker
Task Tracker
Hands-On Exercise

4) HDFS

HDFS Configuration
Monitoring With HDFS
HDFS Permissions and Security
Scalability
Blocks
Replication
HDFS Architecture with Distributing Nodes
HDFS Shell
Hands-On Exercise

5) MapReduce

What Is MapReduce?
Features of MapReduce
Basic MapReduce Concepts
Architectural Overview
Fault Tolerance
Hands-On Exercise

6) Planning your Hadoop Cluster

General Planning Considerations
Choosing the Right Hardware
Network Considerations
Configuring Nodes





7) Hadoop Framework Full Installation

Installation Methods
Method 1:
Using pre-Configured Virtual Machine
Method 2:
Manual Installation and Configuration
Installation on Windows/Linux Machines
HDFS Configuration
MapReduce Configuration
Hands-On Exercise

8) Advanced Configuration

Advanced Parameters
Configuring Rack Awareness
Configuring Federation
Configuring High Availability

9) Getting Started With Eclipse IDE

Configuring Hadoop File System on Eclipse IDE
Connecting Eclipse IDE to HDFS
Developing Map/Reduce jobs on Eclipse IDE

10) Writing a MapReduce Program

The MapReduce Flow
Examining a Sample MapReduce Program
Basic MapReduce API Concepts
The Driver Code
The Mapper
The Reducer
Hadoops Streaming API
Using Eclipse for Rapid Development
Hands-On Exercise

11) MAPREDUCE

Parallel Programming Language
Map reduce Overview and Architecture
Developing Map reduce Jobs
Input and Output Data Formats
Job Configuration with Map/Reduce functions
Job Submission on HDFS
Jobs Monitoring

12) MapReduce Advanced Programming

Partitioner
Combiner
Indexing
Searching
Sorting
Grouping/Shuffling

13) Hadoop Streaming With Mapper

14) Distributing Debugging Hadoop Cluster



15) Cluster Monitoring and Troubleshooting

General System Monitoring
Managing Hadoops Log Files
Using the NameNode and
JobTracker Web UIs
Hands-On Exercise
Common Troubleshooting Issues
Benchmarking Your Cluster

16) Using Yahoo Web Services

17) Hadoop Security

18) Pig

Pig Overview
Installation
Pig Latin
Pig with HDFS
Loading HDFS Data into Pig
Grunt Shell
Practices on Pig Scripting
Seeing Pig in actionexample of computing similar patents

19) Hive

Hive Overview
Installation
Hive QL
Hive with HDFS
Hive Structured Data Analyzing
Hive Unstructured Data Analyzing
Hive Semi-structured Data Analyzing
Practices in Hive QL

20) HBase

HBase Overview and Architecture
HBase Installation
HBase Shell
CRUD operations
Scanning and Batching
Filters
HBase Key Design

21) Sqoop
Sqoop Overview
Installation
Imports and Exports

22) HUE
The GUI System
Monitoring Data with HUE

23) Conclusion

You might also like