Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Question1: Briefly describe the basic ideas of MapReduce?

Answer: MapReduce is a programming model for data processing. The model is


simple, yet not too simple to express useful programs in. Hadoop can run MapReduce
programs written in various languages. And most important is MapReduce programs
are inherently parallel, thus putting very large-scale data analysis into the hands of
anyone with enough machines at her disposal. MapReduce comes into its own for
large datasets.

Question2: What programming models are available for


MapReduce?
Answer: The MapReduce is a framework in Hadoop has native support for
running Java applications. It also supports running on- Java applications in Ruby,
Python, C++ and a few other programming languages, via two frameworks, namely
the Streaming framework and the Pipes framework.

Question3: What are the two abstract classes that make up


the MapReduce programming model?
Answer: Map-Reduce is a programming model that is mainly divided into two
phases this is: Map Phase and Reduce Phase. It is designed for processing the data in
parallel which is divided on various machines (nodes). The Hadoop Java programs are
consisted of Mapper class and Reducer class along with the driver class. Reducer is
the second part of the Map-Reduce programming model. The Mapper produces the
output in the form of key-value pairs which works as input for the Reducer.

Question4: What are the two main tasks that InputFormat


has?
Answer: The two main tasks that Input Format are:
a) Split-up: Split-up is the input file(s) into logical Input Split s, each of which is then

This study source was downloaded by 100000814636672 from CourseHero.com on 08-07-2022 01:57:14 GMT -05:00

https://www.coursehero.com/file/159613639/MapReduceAssigmentdocx/
assigned to an individual Mapper. Input Split in Hadoop MapReduce is also the
logical representation of data. It describes a unit of work that contains a single map
task in a MapReduce program. Also, the split is divided into records. Hence, the
mapper processes each record (which is a key-value pair).

b) RecordReader: It provide the RecordReader implementation to be used to glean


input records from the logical Input Split for processing by the Mapper. RecordReader
is also uses the data within the boundaries that are being created by the input split and
creates Key-value pairs for the mapper.

Question5: How does MapReduce guarantee the uniqueness


of key?
Answer: The mapper outputs the record as the key, and null as the value. The
reducer groups the nulls together by key, so we'll have one null per key. We then
simply output the key, since we don't care how many nulls we have. Because of all
these each key is grouped together, the output data set is guaranteed to be unique.

This study source was downloaded by 100000814636672 from CourseHero.com on 08-07-2022 01:57:14 GMT -05:00

https://www.coursehero.com/file/159613639/MapReduceAssigmentdocx/
Powered by TCPDF (www.tcpdf.org)

You might also like