Professional Documents
Culture Documents
Hadoop Updated Course Content
Hadoop Updated Course Content
1) Introduction
Problems with traditional large-scale systems
Requirements for a new approach
Introduction to Existing Systems
Challenges in Traditional Databases
2) WELCOME to Hadoop
Introduction to Hadoop?
RDBMS Comparison with Hadoop
Motivation for Hadoop
Hadoop Terminology
3) Hadoop Eco-Systems
Namenode
Datanode
Secondary Namenode
Job Tracker
Task Tracker
Hands-On Exercise
4) HDFS
HDFS Configuration
Monitoring With HDFS
HDFS Permissions and Security
Scalability
Blocks
Replication
HDFS Architecture with Distributing Nodes
HDFS Shell
Hands-On Exercise
5) MapReduce
What Is MapReduce?
Features of MapReduce
Basic MapReduce Concepts
Architectural Overview
Fault Tolerance
Hands-On Exercise
6) Planning your Hadoop Cluster
General Planning Considerations
Choosing the Right Hardware
Network Considerations
Configuring Nodes
7) Hadoop Framework Full Installation
Installation Methods
Method 1:
Using pre-Configured Virtual Machine
Method 2:
Manual Installation and Configuration
Installation on Windows/Linux Machines
HDFS Configuration
MapReduce Configuration
Hands-On Exercise
8) Advanced Configuration
Advanced Parameters
Configuring Rack Awareness
Configuring Federation
Configuring High Availability
9) Getting Started With Eclipse IDE
Configuring Hadoop File System on Eclipse IDE
Connecting Eclipse IDE to HDFS
Developing Map/Reduce jobs on Eclipse IDE
10) Writing a MapReduce Program
The MapReduce Flow
Examining a Sample MapReduce Program
Basic MapReduce API Concepts
The Driver Code
The Mapper
The Reducer
Hadoops Streaming API
Using Eclipse for Rapid Development
Hands-On Exercise
11) MAPREDUCE
Parallel Programming Language
Map reduce Overview and Architecture
Developing Map reduce Jobs
Input and Output Data Formats
Job Configuration with Map/Reduce functions
Job Submission on HDFS
Jobs Monitoring
12) MapReduce Advanced Programming
Partitioner
Combiner
Indexing
Searching
Sorting
Grouping/Shuffling
13) Hadoop Streaming With Mapper
14) Distributing Debugging Hadoop Cluster
15) Cluster Monitoring and Troubleshooting
General System Monitoring
Managing Hadoops Log Files
Using the NameNode and
JobTracker Web UIs
Hands-On Exercise
Common Troubleshooting Issues
Benchmarking Your Cluster
16) Using Yahoo Web Services
17) Hadoop Security
18) Pig
Pig Overview
Installation
Pig Latin
Pig with HDFS
Loading HDFS Data into Pig
Grunt Shell
Practices on Pig Scripting
Seeing Pig in actionexample of computing similar patents
19) Hive
Hive Overview
Installation
Hive QL
Hive with HDFS
Hive Structured Data Analyzing
Hive Unstructured Data Analyzing
Hive Semi-structured Data Analyzing
Practices in Hive QL
20) HBase
HBase Overview and Architecture
HBase Installation
HBase Shell
CRUD operations
Scanning and Batching
Filters
HBase Key Design
21) Sqoop
Sqoop Overview
Installation
Imports and Exports
22) HUE
The GUI System
Monitoring Data with HUE
23) Conclusion