Professional Documents
Culture Documents
Unit 1 Lecture 3
Unit 1 Lecture 3
▪ Since it functions as “semi -reducer” ,it has the same interface as reducer and often are
the same class. The combiner() executes on each machine that performs a map task
▪ Combiner combines the values of all keys of a single mapper (single machine) :
Shuffle phase in Hadoop transfers the map output from Mapper to a Reducer
in MapReduce.
Sort phase in MapReduce covers the merging and sorting of map outputs.
Data from the mapper are grouped by the key, split among reducers and
sorted by the key.
Every reducer obtains all values associated with the same key.
Shuffle and sort phase in Hadoop occur simultaneously and are done by the
MapReduce framework
Example suited for map reduce:
Host size
◼ Suppose we have a large web corpus
◼ Look at the metadata file
▪ Lines of the form: (URL, size, date, …)
◼ For each host, find the total number of bytes
▪ That is, the sum of the page sizes for all URLs from that
particular host
◼ Other examples:
▪ Link analysis and graph processing
▪ Machine Learning algorithms
A B B C A C
a1 b1 b2 c1 a3 c1
a2 b1 b2 c2 a3 c2
a3 b2 b3 c3 a4 c3
a4 b3