Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

MapReduce in Cloud Computing

MapReduce , MapReduce Paradigm, MapReduce


Examples,Hadoop,HDFS
Distributed system
• A distributed system is a system whose components are located on
different networked computers, which communicate and coordinate
their actions by passing messages to one another.

• A distributed system allows resource sharing, including software by


systems connected to the network. Examples of distributed systems
/ applications of distributed computing : Intranets, Internet, WWW,
email.
MapReduce
MapReduce is a general-purpose programming model for
data-intensive computing
• Pioneered by Google
• Processes 20 PB of data per day
• Popularized by open-source Hadoop project
• Used by Yahoo!, Facebook, Amazon, …
• It uses a parallel computing model that distributes
computational tasks to large number of nodes(approx
1000-10000 nodes.)

It is fault-tolerable. It can work even when 1600 nodes


among 1800 nodes fails.
• In MapReduce model, user has to write only two functionsmap and
reduce
Few examples that can be easily expressed as
MapReduce computations:
Distributed Grep
Count of URL Access Frequency
Inverted Index
Mining
Example :
• MapReduce is a programming model for large-scale
computing
It uses distributed environment of the cloud to process
large amount of data in reasonable amount of time.
It was inspired by map and reduce function of Functional
Programming Language(like LISP, scheme, racket)[3].
Map and Reduce in Racket (Functional Programming
Language)
Map:
(map f list1) ! list2
e.g. (map square ’(1 2 3 4 5)) ! ’(1 4 9 16 25)
Reduce:
(foldl f init list1) ! any
e.g. (foldl + 0 ’(1 2 3 4 5)) ! 15
• It analyzes Hadoop.
Hadoop is the implementation of MapReduce Model.
• It process data parallely in distributed manner.
• It divides the data into different logical blocks and process
these logical blocks in parallel on different machines and at
last combines all the results to produce the final result.
• It is fault-tolerable.
• One attractive feature of Hadoop is that user can write the
map and reduce functions in any programming langauge.
Approach Used

• Hadoop is an open source Java framework for processing


large amount of data on the clusters of machines[1].
Hadoop is the implementation of Google’s MapReduce
programming model.
Yahoo is the biggest contributor of Hadoop[5].
Hadoop has mainly two components:
• Hadoop Distributed File System (HDFS)
• MapReduce
HDFS
• HDFS provides support for distributed storage[1].
Like traditional File System, the files can be deleted,
renamed etc.
HDFS has two types of nodes:
• Name Node
• Data Node
Name Node
• Name Node:
Name Node provides the main data services.
It is a process that runs on a separate machine.
It stores only the meta-data of the files and directories.
Programmer access files through it.
For reliablity of the file system, it keeps multiple copies of
the same file blocks.
Data Node
• Data Node:
Data Node is a process that runs on individual machine of
the cluster.
The file blocks are stored in the local file system of these
nodes.
It periodically send the meta-data of the stored blocks to the
Name Node.

You might also like