Running Head: Title of Paper in Caps 1: Hadoop, Mapreduce and HDFS: A Developers Perspective

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Running head: TITLE OF PAPER IN CAPS 1

Hadoop, MapReduce and HDFS: A Developers Perspective

Mohd Rehan Ghazi, Durgaprasad Gangodkar

Article Summary

Name of Student

Institution Affiliation
ARTICLE SUMMARY

ARTICLE SUMMARY

In this summary, the authors of the article [1] describe the necessity of big-data

friendly architectures that allow for adequate processing of the large sized data. They discuss

the Hadoop architecture as well as MapReduce.

Discussion

The authors first present the need of deploying storage clusters, stating that this

requires fault tolerance, data-distribution, parallel processing, high availability, as well as

scalability. Apache Hadoop is the most popular open source implementation of the Google-

based MapReduce programming model. It allows big data processing on affordable computer

hardware. This is a game changer for big companies like Facebook, Google, etc..

Migrating towards cloud computing is not only cost effective, but also provides high

performance. Hadoop provides efficiency, as well as several cloud deployable solutions to

ease the transition into the cloud domain. The Hadoop architecture is as follows. It consists of

five different daemons, each running its own JVM. Fig.1 from [1] shows the hierarchy of

these daemons. The Hadoop cluster is made up of master nodes, which run the master

daemons on the layers, and slave nodes which are run by the remainder of the machines.

The biggest advantage of Hadoop File Distribution System is fault tolerance. It

continues to provide service even in the event of a node failure, thus reducing the probability

of destructive failures. MapReduce is the software framework implemented by Apache

Hadoop. It is named so because it consists of two processes for data analysis, Map phase and

Reduce phase.

What I found very interesting is the fact that master nodes can also play the role of

slave nodes, because it shows the need for more slave nodes sometimes, while it is not the

case vice versa because slave nodes cannot become master nodes. This has also raised a
ARTICLE SUMMARY

question for me, if the master nodes can become slave nodes, what are the implications of this

on the hierarchy of daemons discussed by the authors?


ARTICLE SUMMARY

Conclusion

This article has helped familiarize me with the world of big data, and how big tech

companies like Google and Facebook require very efficient query processing while still

keeping costs at minimum, and that this is provided by the Hadoop architecture. I also

understood how Hadoop and MapReduce work together to optimize performance.


ARTICLE SUMMARY

References

Articles in Scholarly Journals

[1]. Ghazi, M. and Gangodkar, D. (2018). Hadoop, MapReduce and HDFS: A Developers

Perspective.

You might also like