Welcome to Scribd!

BDP 2024 07

Uploaded by

0% found this document useful (0 votes)

4 views17 pages

This document summarizes key aspects of storing and processing big data in Hadoop Distributed Filesystem (HDFS). It describes the HDFS architecture with NameNode and DataNodes, reading and writing data to HDFS in a distributed manner, strategies for handling node failures, HDFS federation to address memory issues, balancing data across clusters, block caching, and common filesystem operations like copying, creating directories, and listing files.

Original Description:

Original Title

bdp-2024-07

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

4 views17 pages

BDP 2024 07

Uploaded by

Jay Nagwani

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 17

Search inside document

Big Data Processing

Jiaul Paik
Lecture 7
Storing Big Data in Cluster

Hadoop Distributed Filesystem

HDFS (Hadoop) Architecture
namenode = master node

HDFS namenode
Application /foo/bar
(file name, block id)
File namespace block 3df2
HDFS Client
(block id, block location)

instructions to datanode

datanode state
(block id, byte range)
HDFS datanode HDFS datanode
block data
Linux file system Linux file system

… …

(Ghemawat et al., SOSP 2003)

HDFS

namenode job submission node

namenode daemon jobtracker

tasktracker tasktracker tasktracker

datanode daemon datanode daemon datanode daemon

Linux file system Linux file system Linux file system

… … …
slave node slave node slave node
HDFS
Reading and Writing
Dataflow: Reading data from HDFS
2:get block locations
Distributed
HDFS 1 : o p en NameNode
client FileSystem

3:
re
ad
6 :clo s FSData namenode
e InputStream

5 :re
ad

4:read

DataNode DataNode DataNode

datanode datanode datanode

Adapted from: Hadoop the definitive Guide, 4 th ed, Tom white

Writing data to HDFS

1. Create 2. Create
Distributed
FileSystem Namenode
HDFS Client 3. Write

7. Complete namenode
FSData
6. Close OutputStream

4. Write Packet 5. ack Packet

4 4
Pipeline of Datanode Datanode Datanode
datanodes
datanode datanode datanode
5 5

Adapted from: Hadoop the definitive Guide, 4 th ed, Tom white

Managing Hadoop: Other Key Issues
• Node failure

• HDFS federation (for memory issue)

• Cluster Balancing

• Data Caching
Node failures
• Namenode failures
• All the files in the filesystem are lost
• Since, reconstruction is not possible

• Datanode failure
• Won’t be a problem
• Data blocks are stored in many machines
• Can be recovered from another machine
Tackling Namenode failure
• If namenode fails, then all metadata are lost
• Won’t be able to reconstruct the file from the blocks

• How to handle?

• Maintain a replica of the metadata into another passive machine

• If the active namenode fails, start the passive namenode

• Needs to load the namepace into memory before it starts

HDFS Federation

• The namenode keeps a reference to every file and block in the

filesystem in memory

• For a very large cluster, namenode may run out of memory to hold the
metadata

• Solution: add more namenodes in the cluster

HDFS Cluster Balancing
• When copying data into HDFS, balancing of data storage is
important

• Why?
• HDFS works best when blocks are spread evenly

• Examples:
• In distcp, if m = 1, single task will do the copying
• It will be slow
• Bad utilization of resources

• Default value of m is 20 in Hadoop.

Block Caching
• Generally, datanodes read blocks from the disk

• Frequently accessed blocks can be stored in RAM

• A block is cached in only one datanode’s memory

• Job schedulers tries to run the code on the block that is cached
Filesystem Operations
Filesystem Operations
• Major Filesystem operations:
• reading files, creating directories, moving files, deleting data, and listing
directories.

• One can run a Hadoop command from command line

• To know the details about every command

hadoop fs -help
Filesystem Operations
• Copying a file from the local filesystem to HDFS

hadoop fs -copyFromLocal file-1 file-2

• Copying a file to the local filesystem from HDFS

hadoop fs -copyToLocal source-file dest-file
Filesystem Operations
• Creating a directory
hadoop fs -mkdir mydir

• Listing the files

hadoop fs -ls

SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
From Everand
SAS Certified Professional Prep Guide: Advanced Programming Using SAS 9.4
SAS Institute
Rating: 1 out of 5 stars
1/5 (2)
Yahoo Hadoop Tutorial
Document28 pages
Yahoo Hadoop Tutorial
roshan9786
No ratings yet
Oracle Database Disaster Recovery 19c
Document24 pages
Oracle Database Disaster Recovery 19c
mohammed akbar ali
No ratings yet
Unit - 3 HDFS MAPREDUCE HBASE
Document34 pages
Unit - 3 HDFS MAPREDUCE HBASE
sixit37787
No ratings yet
BDP 2024 06
Document14 pages
BDP 2024 06
Jay Nagwani
No ratings yet
Chapter 2 HDFS and ZooKeeper
Document45 pages
Chapter 2 HDFS and ZooKeeper
chentoufi
No ratings yet
Concepts Planning Installation
Document48 pages
Concepts Planning Installation
SAURABH RANJAN
No ratings yet
Welcome To Hadoop Distributed File System (HDFS)
Document36 pages
Welcome To Hadoop Distributed File System (HDFS)
Vinod Kanwar
No ratings yet
Module-2-Introduction To HDFS and Tools
Document38 pages
Module-2-Introduction To HDFS and Tools
shreya
No ratings yet
Unit 4
Document104 pages
Unit 4
nosopa5904
No ratings yet
Unit-2 Introduction To Hadoop
Document19 pages
Unit-2 Introduction To Hadoop
Siva
No ratings yet
Big Data-Module 1 - VTU Aug 2020 Solved Paper
Document10 pages
Big Data-Module 1 - VTU Aug 2020 Solved Paper
Harmeet Singh
No ratings yet
HDFS
Document19 pages
HDFS
K Anantha Krishnan
No ratings yet
BigData Module 1
Document17 pages
BigData Module 1
bhattsb
No ratings yet
Module 1 PDF
Document49 pages
Module 1 PDF
Ajay
No ratings yet
Unit 2
Document53 pages
Unit 2
ahojg
No ratings yet
BigData Module 1 New
Document17 pages
BigData Module 1 New
sangamesh k
No ratings yet
Big Data Huawei Course
Document12 pages
Big Data Huawei Course
Thiago Siqueira
No ratings yet
HDFS
Document37 pages
HDFS
Priyanki Tanwar
No ratings yet
Unit 3.1
Document88 pages
Unit 3.1
Awadhesh Maurya
No ratings yet
Module I - Hadoop Distributed File System (HDFS)
Document51 pages
Module I - Hadoop Distributed File System (HDFS)
Sid Mohammed
No ratings yet
HDFS Material
Document24 pages
HDFS Material
Nik Kumar
100% (1)
Fbda Unit-3
Document27 pages
Fbda Unit-3
Aruna Aruna
No ratings yet
Rob Jordan & Chris Livdahl
Document32 pages
Rob Jordan & Chris Livdahl
THUPAKULA BHASKAR
No ratings yet
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
Document34 pages
Hadoop Distributed File System: Bhavneet Kaur B.Tech Computer Science 2 Year
Simran Guliani
No ratings yet
Notes
Document18 pages
Notes
nagalaxmi
88% (8)
Hadoop Distributed File System HDFS 1688981751
Document49 pages
Hadoop Distributed File System HDFS 1688981751
Siddharth set
No ratings yet
BFS U2
Document17 pages
BFS U2
Durga Bisht
No ratings yet
CC Unit-5
Document33 pages
CC Unit-5
Rajamanikkam Rajamanikkam
No ratings yet
Unit 5 Print
Document32 pages
Unit 5 Print
sivapunithan S
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
Document11 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
Mudit Kumar
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
Document37 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
Komal
No ratings yet
Hdfs and Pig
Document13 pages
Hdfs and Pig
DEEPINDER SINGH
No ratings yet
Hadoop Distributed File System (HDFS)
Document22 pages
Hadoop Distributed File System (HDFS)
sanaa
No ratings yet
BDA Module 2 - Notes PDF
Document101 pages
BDA Module 2 - Notes PDF
Nidhi Srivastava
No ratings yet
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
Document9 pages
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
Shubham
No ratings yet
HDFS Intro
Document21 pages
HDFS Intro
Nandini Malviya
No ratings yet
The Hadoop Distributed File System
Document10 pages
The Hadoop Distributed File System
dengchaowen.bupt
No ratings yet
Hadoop File System: B. Ramamurthy
Document36 pages
Hadoop File System: B. Ramamurthy
forjunklikescribd
No ratings yet
To Big Data & Hadoop: Department of Computer Engineering
Document29 pages
To Big Data & Hadoop: Department of Computer Engineering
Vaibhav Sawant
No ratings yet
Hadoop
Document4 pages
Hadoop
scribd.unguided000
No ratings yet
HDFS Concepts
Document10 pages
HDFS Concepts
pallavibhardwaj1124
No ratings yet
PPT05-Hadoop Storage Layer
Document67 pages
PPT05-Hadoop Storage Layer
TsabitAlaykRidhollah
No ratings yet
BDA Module-1 Notes
Document14 pages
BDA Module-1 Notes
Kavita Horadi
No ratings yet
UNIT V-Cloud Computing
Document33 pages
UNIT V-Cloud Computing
Jayanth V 19CS045
No ratings yet
Untitled
Document37 pages
Untitled
asha
No ratings yet
02 Hadoop Architecture and HDFS
Document74 pages
02 Hadoop Architecture and HDFS
arjun.ec633
No ratings yet
UNIT5
Document33 pages
UNIT5
sureshkumar a
No ratings yet
500+ Interview Questions-1
Document126 pages
500+ Interview Questions-1
SavitaDarekar
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
Document112 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
udayachandrikaa@gmailcom
No ratings yet
Read Write in HDFS
Document6 pages
Read Write in HDFS
hemantsingh
No ratings yet
Storage: BDA Asignment-1 - Diagran Processing
Document16 pages
Storage: BDA Asignment-1 - Diagran Processing
Prakash G
No ratings yet
Unit V Cloud Technologies and Advancements 8
Document33 pages
Unit V Cloud Technologies and Advancements 8
Jaya Prakash M
No ratings yet
BDA - Unit-2
Document24 pages
BDA - Unit-2
Aishwarya Rayasam
No ratings yet
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
Document43 pages
Big-Data Computing: Hadoop Distributed File System: B. Ramamurthy
Yogesh Bansal
No ratings yet
Hadoop Architecture
Document84 pages
Hadoop Architecture
Celina Sawan
No ratings yet
Unit 2 Hadoop
Document67 pages
Unit 2 Hadoop
AKSHAY Kumar
No ratings yet
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
From Everand
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
Michael W. Lucas
No ratings yet
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
21BCT0093 VL2022230504083 Ast08
Document15 pages
21BCT0093 VL2022230504083 Ast08
Srinivasan Uma
No ratings yet
Computer Programming-Unit 5
Document4 pages
Computer Programming-Unit 5
jaba123jaba
No ratings yet
Evaluating Direct Routing For Microsoft Teams Telephony
Document4 pages
Evaluating Direct Routing For Microsoft Teams Telephony
adminak
No ratings yet
Syllabus
Document2 pages
Syllabus
Shyam Shankar
No ratings yet
Street Light
Document8 pages
Street Light
Saurabh Patil
No ratings yet
ION Series: Til-Eu1
Document4 pages
ION Series: Til-Eu1
Moataz AbdelKader
No ratings yet
ASUS TUF F15 FX507VU4-LP058 Intel 13gen Core I7-13700h Nvidia RTX 4050 6GB & IPS
Document12 pages
ASUS TUF F15 FX507VU4-LP058 Intel 13gen Core I7-13700h Nvidia RTX 4050 6GB & IPS
g2g76c854b
No ratings yet
Lecture Notes On: Department of Computer Science & Engineering Jaipur Engineering College & Research Centre, Jaipur
Document32 pages
Lecture Notes On: Department of Computer Science & Engineering Jaipur Engineering College & Research Centre, Jaipur
Kunall
No ratings yet
Huawei s5731 S Datasheet
Document27 pages
Huawei s5731 S Datasheet
Antar Al-mubarzi
No ratings yet
System Info
Document5 pages
System Info
Chandra
No ratings yet
Guia de Usuario Router ZyXEL P330W
Document14 pages
Guia de Usuario Router ZyXEL P330W
Norberto Pérez Dib
No ratings yet
Baumer BUDE-PROFINET MA EN
Document41 pages
Baumer BUDE-PROFINET MA EN
Tapfumanei Zhou
No ratings yet
Wdio Basics
Document7 pages
Wdio Basics
Swetha S
No ratings yet
Chapter 5 - Infrastructure
Document53 pages
Chapter 5 - Infrastructure
MUHAMMAD SHAFIQ HAQIME MOHD ISA
No ratings yet
Module 3 - Types of Computer Network
Document39 pages
Module 3 - Types of Computer Network
Rhea Joy Pellejo
No ratings yet
#Chapter - 7 @HCI (Important Points)
Document8 pages
#Chapter - 7 @HCI (Important Points)
Izz Hfz
No ratings yet
Voice Enabled Home Electrical Appliances For Visually Impaired People
Document20 pages
Voice Enabled Home Electrical Appliances For Visually Impaired People
Tummuri Shanmuk
No ratings yet
Defect Tracker Template SDLC 1
Document91 pages
Defect Tracker Template SDLC 1
Rohit
No ratings yet
Log File Locked by Another Application
Document7 pages
Log File Locked by Another Application
Дмитро
No ratings yet
Course Plan - Soft Computing
Document5 pages
Course Plan - Soft Computing
kannanchammy
No ratings yet
Metamodels For Industry 4.0
Document22 pages
Metamodels For Industry 4.0
Rajiv Ranjan
100% (1)
Ccna Day 12 RSTP Etherchannel
Document16 pages
Ccna Day 12 RSTP Etherchannel
Md Moizuddin
No ratings yet
New Solution For The Press-Fit Sensing System Assuring High Precision Press Fit Application
Document3 pages
New Solution For The Press-Fit Sensing System Assuring High Precision Press Fit Application
Maulana Malik ibrahim
No ratings yet
Data Structure and Algorithms (CO2003) : Chapter 1 - Introduction
Document41 pages
Data Structure and Algorithms (CO2003) : Chapter 1 - Introduction
ĐỨC TRẦN HUY
No ratings yet
BS Electronics Engineering Curriculum Update
Document2 pages
BS Electronics Engineering Curriculum Update
Heart Wilson
No ratings yet
Complex Instruction Set Computer
Document17 pages
Complex Instruction Set Computer
kangkanpaul
No ratings yet
En GD08 Edm
Document3 pages
En GD08 Edm
Glenn Martyn
No ratings yet
Tutorial VPS Internet Ilimitada
Document13 pages
Tutorial VPS Internet Ilimitada
Caio César
No ratings yet
Review of Related Literature and Studies: On Web-Based Expert System For Class Schedule Planning Using JESS
Document3 pages
Review of Related Literature and Studies: On Web-Based Expert System For Class Schedule Planning Using JESS
Patrick Contreras
No ratings yet