Professional Documents
Culture Documents
Module 02 HDFS - Hadoop Distributed File System
Module 02 HDFS - Hadoop Distributed File System
of HDFS
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 2
CONTENTS
01 02 03 04
HDFS Overview Position of HDFS in HDFS Key
and Application FusionInsight HD System Features
Scenarios Architecture
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Dictionary vs. File System
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 4
HDFS Overview
Hadoop distributed file system (HDFS) is developed based on Google file system
(GFS) and runs on commodity hardware.
In addition to the features provided by other distributed file systems, HDFS also
provides the following features:
• High fault tolerance: resolves hardware unreliability problems.
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 5
HDFS Application Scenarios
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 6
CONTENTS
01 02 03 04
HDFS Overview Position of HDFS in HDFS Key
and Application FusionInsight HD System Features
Scenarios Architecture
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Position of HDFS in FusionInsight
Application service layer
System
Hadoop API Plugin API management
Service
Hive M/R Spark Storm Flink
governance
Hadoop YARN / ZooKeeper LibrA Security
management
HDFS / HBase
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 8
CONTENTS
01 02 03 04
HDFS Overview Position of HDFS in HDFS Key Features
and Application FusionInsight HD System
Scenarios Architecture
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Basic System Architecture
HDFS Architecture
Metadata (Name,replicas,…) :
NameNode /home/foo/data,3,…
Metadata ops
Block ops
Client
Replication
Blocks Blocks
Client
Rack 1 Rack 2
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 10
HDFS Data Write Process
2:create
1:create Distributed
HDFS NameNode
3:write File System
Client 7:complete
Client node
4 4
DataNode DataNode DataNode
5 5
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 11
HDFS Data Read Process
Client node
5:read
4:read
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 12
CONTENTS
01 02 03 04
HDFS Overview Position of HDFS in HDFS Key Features
and Application FusionInsight HD System
Scenarios Architecture
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Key Design of HDFS Architecture
NameNode / DataNode
Federation storage
in master / slave mode
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 14
HDFS High Availability (HA)
Heartbeat Heartbeat
EditLog
ZKFC JN JN JN ZKFC
Read log
Write log
FSlmage
Metadata NameNode synchronization NameNode
operation (Active) (standby)
HDFS
Block operation
Client Data read write Heartbeat
Copy
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Metadata Persistence
1. Rolls Editlog.
Editlog FSImage
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 16
HDFS Federation
APP Client-1 Client-k Client-n
Common Storage
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Data Replication
Data Center Placement policy
Distance=4
Distance=4
Distance=0
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Colocation
T he definition of Colocation: is to store associated data or data that is going to be associated on the
same storage node.
According to the picture below, assume that file A and file D are going to be associated with each other,
which involves massive data migration. Data transmission consumes much bandwidth, which greatly
affects the processing speed of massive data and system performance.
NN
F
Aile A
A
A A
A B A
B C A
B D A
C D A File
A B
C D File
A C
DN1 DN2 DN3 DN4 DN5 DN6 File
A D
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Colocation Benefits
T he HDFS colocation: is to store files that need to be associated with each other on the same data node so
that data does not have to be obtained from other nodes during associated computing. This greatly reduces
network bandwidth consumption.
When joining files A and D with colocation feature, resource consumption can be greatly reduced because the
blocks of multiple associated files are distributed on the same storage node.
NN
F
Aile A
A C A
A B A
B C A
B A
C A D File
A B
D D File
A C
DN1 DN2 DN3 DN4 DN5 DN6 File
A D
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Summary
This module describes the following information about
HDFS: basic concepts, application scenarios, technical
architecture and its key features.
Copyright © 2019 Huawei Technologies Co., Ltd. All rights reserved. Page 21
THANK YOU!