Professional Documents
Culture Documents
HDFS Architecture: © Hortonworks Inc. 2011 - 2018. All Rights Reserved
HDFS Architecture: © Hortonworks Inc. 2011 - 2018. All Rights Reserved
Hadoop
Client HDFS
Hadoop
Client HDFS
⬢ Divide files into big blocks and distribute across the cluster
⬢ Store multiple replicas of each block for reliability
⬢ Programs can ask "where do the pieces of my file live?”
10110100101 Cluster
00100111001
1
11111001010
01110100101
00101100100 1 3 2
10101001100
2
01010010111
01011101011 4 1 3
11011011010
Blocks 10110100101
01001010101
3
01011100100
11010111010 2 4
0 4 2 3
1
4
Logical File
⬢ NameNode
– Is the master service of HDFS
– Determines and maintains how the chunks of data are distributed across the DataNodes
⬢ DataNode
– Stores the chunks of data, and is responsible for replicating the chunks across other DataNodes
NameNode
NameNode
3. For every block, the client will request the NameNode to provide a new blockid and a list of
destination DataNodes.
4. The client will write the block directly to the first DataNode in the list.
5. The first DataNode pipelines the replication to the next DataNode in the list.
maximize Minimize
availability minimize write cost
DataNode DataNode
write cost DataNode
blk1 blk2
minimize
write cost
DataNode maximize
DataNode DataNode Maximize
availability availability
write blk2 blk2 and read
performance
⬢ Security
– Classic POSIX permissioning (ex: -rwxr-xr--)
– Extended Access Control Lists (ACL) for richer scenarios
– Centralized authorization policies and audit available via Ranger plug-in
⬢ Quotas
– Easy to understand data size quotas
– Additional option for controlling the number of files
1. HDFS breaks files into _______ and persists multiple ________ across
the cluster to aid in the file system’s _______ and the to help
programs obtain ___________.
1. HDFS breaks files into _______ and persists multiple ________ across
the cluster to aid in the file system’s _______ and the to help
programs obtain ___________.
2. What is the primary master node service?
1. HDFS breaks files into _______ and persists multiple ________ across
the cluster to aid in the file system’s _______ and the to help
programs obtain ___________.
2. What is the primary master node service?
3. What is the worker node service?
1. HDFS breaks files into _______ and persists multiple ________ across
the cluster to aid in the file system’s _______ and the to help
programs obtain ___________.
2. What is the primary master node service?
3. What is the worker node service?
4. True/False? Clients avoid writing data through the NameNode.
1. HDFS breaks files into _______ and persists multiple ________ across
the cluster to aid in the file system’s _______ and the to help
programs obtain ___________.
2. What is the primary master node service?
3. What is the worker node service?
4. True/False? Clients avoid writing data through the NameNode.
5. True/False? Clients write replica copies directly to each DataNode.
⬢ HDFS breaks files into blocks and replicates them for reliability and processing data
locality
⬢ The primary components are the master NameNode service and the worker
DataNode service
⬢ The NameNode is a memory-based service
⬢ The NameNode automatically takes care of recovery missing and corrupted blocks
⬢ Clients interact with the NameNode to get a list, for each block, of DataNodes to
write data to