Hdfs Cartoon

You might also like

Download as pdf
Download as pdf
You are on page 1of 5
THE CAST Haboop DISTRIBUTED FILE SYSTEM (HDFS) People sit in front of me \endask me to reste edorg) =) > (There is onty ONE of We storedota. there are MANY of us sometimes even thousands! —_ (~ and x cordate) sien cramer [eLreNT] INAMENODE| DATANODES| WRITING DATA IN HDFS CLUSTER REQUEST FROM USER BLOCK AND REPLICATION J Let's stort with iting sme date.) x ‘Me, Client, please write \ 200 MB data for me ae fos) An yes. please: 0) divide the data in 120M blocks} BD) sss £ (a3) forgetting \ something? a (Apes lent euay tron a, [BLOCKSIZE: large file is divided] in blocks (usually 64 or 128MB) REPLICATION FACTOR: ‘each block is stored in multiple locations (usually 3) DIVIDE FILE INTO BLOCKS ‘ASK NAMENODE, NAMENODE ASSIGNS DATANODES Lets work on the first block first baie as) | 2900000 2XX X00 poconny 222000002 20000022 ZO ra] ea MI First-- I divide the MI [Mr. Namenode: please help| me write a 128MB block with replication of 3 Replication 3 ee.) Ts [ need'to find 3 datanodes } for weet) (How do I do that? {win tell you some other time Here you go buddy. ‘Addresses of three datanodes. T have also sorted them in increasing distance from you ‘si Datanode 1, Datanode 2, Datanode 3 CLIENT STARTS WRITING DATA (send my dota (and thelist) to (first daerede oly I store the data in hard drive, and-- ee WHILE I am recieving data, I forward the same \ data to the next datanode © Maneesh Varshney. mvarshney@gmail.com (Kem eos) renin) We TA..DA... REPLICATION PIPELINE INFORM NAMENODE WHEN DONE ‘Once all data (For this block) is written to hard disk lsend DONE to namenode ca Te =» [Done| Bock success stored) ‘and replicated in HDFS ‘When T am done with a block, Trepeat the same steps Kone ‘remaining blocks WHEN ALL BLOCKS ARE WRITTEN. . RECAP All blocks written, please close file NOW I store all meta information in persistent ‘storage (hard disks) L cose csealt | -we stored data via Replication Pipeline READING DATA IN HDFS CLUSTER REQUEST FROM USER CONTACT NAMENODE FIRST. (tein tein HOES check What about reading them? Let's ask the client again 4 Mr. Client, please read this file termes) = Ig Please give me info on this file Filename] (angie (a) list of all blocks for this fle, (b) ist of datanodes for each block (wot distance 9 fe kK Block 1: at DN x1, yl, z1 con Block 3: at DN x3, y3, 23| and so on, “oy how sinc, (coieentonno, feist es, ames 2 ‘So I download each bec) inturn, ter tke se) DOWNLOAD DATA (Counlced date from the nearest datanode (the first in list) Umm. Question What happens when the datanode is dead or does not have the data, —FPease give me block a| or the data is corrupted DATA for block n] [Actually, HDFS can very elegantly handle these faults and more 5 we will see next ~~ FAULT TOLERANCE IN HDFS. PART I: TYPES OF FAULTS AND THEIR DETECTION FAULT I: NODE FAILURE FAULT II: COMMUNICATION FAILURE [rere re typically tree kind of fous ‘he Firat NODE FAILURE * Goodbye, cruel world FAULT III: DATA CORRUPTION (Second is COMMUNICATION FAILURE (cannot send and receive data) Third is DATA CORRUPTION [Data can be corrupted while sending over network se [Or corrupted while itis Stored in hard disks ee DETECTION #1: NODE FATLURES DETECTING DATANODE FATLURE NOTE: Tf Namenode is dead, ‘the entire cluster is dead! Namenode is the SINGLE POINT OF FATLURE Instead, let's focus on how datanode failures ‘are detected =r) / te sna enzr8ea7 {message every 3 seconds This is our way of saying we are alive Era ant get amessage\ { in 10 minutes, the \geremt is dead rome J | |i may be ALIVE and, there was onlya network failure, but ‘the namenode treats both as same) DETECTION #2: NETWORK FATLURES DETECTION #3: CORRUPTED DATA DETECTING CORRUPTED HARD DRIVES Whenever data is sent, Checksum is sent along with ‘an ACK is replied by the reciever Fay Datel ‘transmitted data [ra] [checksum] data! Re we EL a ae If the ACK is not received (after several retries), the sender assumes that the host is dead, or the network has failed data in hard disks, T also store the checksum /| checksum_| (Perssty erent \ Taefal (one to | ‘the namenode Ty lblocks T have| Before sending block report T check if checksums are ok, T don't send info for blocks that are corrupted Thave four blocks RECAP: HEARTBEAT MESSAGES AND BLOCK REPORTS We send heartbeats every (a | blocks... so one (eis carted 3 seconds to say we are alive E (Wes block reperts\ ‘and we skip blocks) sen st ee (which is how the rnamenode will know which blocks are lost) FAULT TOLERANCE IN HDFS. PART IT: HANDLING READING AND WRITING FAILURES. HANDLING WRITE FAILURES (ore tin shoud have sid fier) Lwrite the block in smaller data ni (uly 64K) ced packer Moreover, each datanode replies back an ACK for each packet to) confirm that they got ry So, if I don't get ACKs from some atanode, F know itis dead \ ss pie in) Remember replication pipleline? — —— aga el pockes|| | eZ RS ms, ea) |e =| ‘ L [=a ‘ - gi eS Ba ; \ - ee Here's he dusted ppt. Note thot he blockwitve, under replicated but the namenede [ra] Re HANDLING READ FAILURES /aemenber.vnen asked for\ RE (“ean heb e “anenedegaeine lcatos of of deenodes [Fa] Se DN, DN2, DN 3| Tf one datanode is dead, T read from the others in the list FAULT TOLERANCE IN HDFS. PART IIT: HANDLING DATANODE FATLURES FirsTrust tllyou | about the two tables I keep. List of Blocks Block 1 - stored at DNIL, DN2, DN3 Block 2 - stored at DNI, DN4, DN5| List of Datanodes © continuously update these ‘two tables-- If I find a block on a datanode is corrupted, I update first table (by removing bad DN from block's list) ‘And if T find that a datanode feed Fed Sths) UNDER REPLICATED BLOCKS /— rc the first list (list of blocks) periodically, and see if ‘there are blocks that ‘re not replicated properly For ell uiderseplicnted vice skamer dtenodes ve copy them from datanodes that ave the replica Z ike so) Could you copy the. block from that datanode| Hey, Tneed to lcopy a block from you ( Here you go. (mmo more gestion: Af snc et Oh aa Corot hee mamse That's correct. HDFS cannot guarantee that atleast one replica will always survive. But it tries it best by smartly selecting replica locations, as we will see next — REPLICA PLACEMENT STRATEGY |/Kemenber x promised tre you how I select datanode locations for storing the |“ repees oro bile RACKS AND DATANODES, E {ELECTING FIRST REPLICA LOCATION The cluster is divided into RACKS] Each rack has multiple datanodes lack 1] [Rack 2] [rack 3 Tf the writer is a member of cluster, it is selected as first replica Otherwise some random | dete sete) ) NEXT TWO REPLICA LOCATIONS ‘SUBSEQUENT REPLICA LOCATIONS Fist replica) Pick a different rack than first replicas Select two different datanode on that rack| ‘next two replicas Pick any random datanede, if it satisfies these two conditions: nv one -dearede) aX fase ‘those two conditions cannot be satisfied in which case they are .. chem.. ignored (convenient eh?) _/ JAiso, HDFS allows you use your| ‘own placement algorithm, So if you know a better algorithm, don’t be shy now. T do a lot of other things as well.. read more about me at websites and books Or best of all ( install and run HFS: wicca) in our next comics sre) oA — ete na an store dra Ween tan Neprbesuce obs Read about map reduce THE END

You might also like