Professional Documents
Culture Documents
2-HadoopArchitecture HDFS
2-HadoopArchitecture HDFS
• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session
• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class
• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic
• If you want to connect to your Personal Learning Manager (PLM), dial +917618772501
• We have dedicated support team to assist all your queries. You can reach us anytime on the below numbers:
US: 1855 818 0063 (Toll-Free) | India: +91 9019117772
• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience
▪ HDFS Architecture
▪ What is HDFS?
Slave01
Master
DataNode
NameNode Slave02
http://master:50070/ NodeManager
DataNode
Slave03
ResourceManager
http://master:8088 NodeManager
DataNode
Slave04
NodeManager
DataNode
Slave05
NodeManager
DataNode
NodeManager
HDFS YARN
NameNode ResourceManager
ViewFS Map
Namespace
Datanode … Datanode
DNn DNn DNn
Storage
Ans. Put will fail. None of the namespace will manage the file and you
will get an IOException with a No such file or directory error.
Client
All name space edits
logged to shared NFS Read edit logs
storage; single writer Shared Edit Logs and applies to its
(fencing) own namespace
NameNode
Secondary Active Standby
High Name Node NameNode NameNode
Availability
Masters
Resource Manager
Applications
Scheduler
Manager (AsM)
Slaves
App App
Container Master Container Master
DataNode DataNode
▪ We use Hadoop to store copies of internal log and dimension data sources and use
it as a source for reporting/analytics and machine learning.
• Has no DFS.
Fully-Distributed Mode
Configuration
Description of Log Files
Filenames
hadoop-env.sh Environment Variables that are used in the scripts to run Hadoop.
core-site.xml Configuration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.
hdfs-site.xml Configuration settings for HDFS daemons, the namenode, the Secondary NameNode and the DataNodes.
masters A list of machines (one per line) that each run a Secondary NameNode.
slaves A list of machines (one per line) that each run a Datanode and a NodeManager.
HDFS hdfs-site.xml
YARN yarn-site.xml
MapReduce mapred-site.xml
<value>hdfs://nameservice1</value>
</property>
</configuration>
2. https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
3. https://hadoop.apache.org/docs/r2.8.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
4. https://hadoop.apache.org/docs/r2.8.5/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Slaves
▪ Contains a list of hosts, one per line, that are to host DataNode
and NodeManager services.
Masters
▪ Contains a list of hosts, one per line, that are to host Secondary
NameNode servers.
▪ This file also offers a way to provide custom parameters for each of the servers.
▪ hadoop-env.sh is sourced by all of the Hadoop Core scripts provided inside hadoop directory:
/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop
▪ Examples of environment variables that you can specify:
▪ export HADOOP_HEAPSIZE=“512“
▪ export HADOOP_DATANODE_HEAPSIZE=“128"
DataNode
ResourceManager
JobHistoryServer
Master Nodes
Web UI to look at current status of HDFS, explore
NameNode Web UI (NameNode and any back-up 50070 http
file system
NameNodes)
DataNode All Slave Nodes 50075 http Data Node Web UI to access the status, logs etc.
Ans. False.
Detailed answer will be given after the next question.
Ans. True
It is stored in different part files for eg – part-m-00000, part-m-00001
and so on.
HDFS
Data Loading
copyFromLocal: Similar to “put” command, except that the source is restricted to a local file reference.
distcp: Distributed Copy to move data between clusters, used for backup and recovery
Twitter
Streaming HDFS
API
Flume
Ans. FLUME.
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://www.edureka.in/blog/install-apache-hadoop-cluster/
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-
cluster/
http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.3.0/bk_cluster-planning-
guide/content/ch_hardware-recommendations.html
http://www.edureka.in/blog/hadoop-cluster-configuration-files/
http://www.edureka.in/blog/anatomy-of-a-mapreduce-job-in-apache-hadoop/
http://www.edureka.in/blog/commissioning-and-decommissioning-nodes-in-a-hadoop-cluster/
▪ Secondary Namenode
https://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-
hdfs/HdfsUserGuide.html#Secondary_NameNode
http://www.edureka.in/blog/hadoop-interview-questions-hadoop-cluster/