Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Nom et Prénom : EL AMINE MEHDI

Apache Hadoop is an open source, Scalable, and Fault tolerant framework written


in Java. It efficiently processes large volumes of data on a cluster of commodity hardware.
Hadoop is not only a storage system but it is a platform for large data storage as well as
processing.
In this lecture, we get a look on how Apache Hadoop works uder the hood. So when
Apache Hadoop is getting fed a huge file, the framework divides that chunk of big data
into smaller pieces and stores them across multiple machines to be processed in parallel,
so that’s why Hadoop interconnects an army of widely-available and relatively
inexpensive machines that form a Hadoop cluster, and no matter what the size of the file
that the user feeds to Hadoop, each one of its clusters accommodates three functional
layers, Hadoop distributed file systems for data storage, Hadoop MapReduce for
processing, and Hadoop Yarn for resource management.
Then we get a brief introduction to HDFS, a distributed file systems that follows
master/slave architecture. It consists of a single namenode and many datanodes. In the
HDFS architecture, a file is divided into one or more blocks of 128 Mb (the size can be
changed in the configurations) and stored in separate datanodes. Datanodes are
responsible for operations such as block creation, deletion and replication according to
namenode instructions. Apart from that, they are responsible to perform read-write
operations on file systems.
Namenode acts as the master server and the central controller for HDFS. It holds the file
system metadata and maintains the file system namespace. Namenode oversees the
condition of the datanode and coordinates access to data.

You might also like