Develop & Execute Your First MapReduce Program PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

A Step-by-Step Guide to

Develop & Execute


Your First
MapReduce Program

LEARN. DO. EARN


A Step-by-Step Guide to Develop & Execute
your First MapReduce Program
This EBook is a step by step guide to develop and execute your rst
Map Reduce program in Hadoop. Before we start with our rst program,
lets look at some of the pre-requisites that are required.

1. Pre-requisite

01 This eBook assumes that you 02 The single-node Hadoop


are familiar with key Big Data cluster should be ready.
denitions and terminologies. This means that all the
If not, please download and go daemons should be running.
through our rst eBook in For a step-by-step guide on
Big Data series Understanding installing Hadoop cluster,
Big Data. please refer to this eBook.

The single-node Hadoop cluster should be ready. This means that all the
03
daemons should be running. Refer to the below screenshot for this:

04 Eclipse should be installed in Centos.

www.acadgild.com 02
2. Step-by-Step Guide to Execute MapReduce Program

Follow the below mentioned steps to execute Map Reduce program:

2.1 Write the MapReduce Program in Eclipse

Before we go ahead with writing MapReduce program let us have a brief


understanding about Eclipse.

Eclipse is an IDE for developing Java applications. We will be writing


MapReduce codes in Eclipse and then export it as a jar le in Hadoop
environment.

Follow the below mentioned steps to make a jar le and export it into
Hadoop environment.

Step 1

Open Eclipse and click


File -> New -> Java project.

Step 2

Write the name of the


project and then click
Finish.

www.acadgild.com 03
Step 3

Right-click on the source of


the project. A pop-up menu
will appear.
Click on New -> Select class.

Step 4

Write the Main Class name.


Check the box for Main Class
to include the Main Class,
as shown in the given
screenshot.

You can go through the link given below to understand how to


write your rst MapReduce program in Hadoop in depth.

https://drive.google.com/le/d/0Bxr27gVaXO5s
RExfdndnaWN6WUk/view? usp=sharing

www.acadgild.com 04
After writing the code in Eclipse, you will encounter many errors as Eclipse does not
have the required Hadoop libraries.

Refer to the below screenshot where errors are highlighted with yellow colour:

3. Removing the Errors

Follow the steps mentioned below to remove errors during the compile time
by adding the two jars les in Hadoop directory.

Adding the two jar les will add the required references for the Hadoop
related classes and all the errors will get removed.

Follow the below steps to add the external libraries.

www.acadgild.com 05
Step 1

Right-click on the Class name


from the created Java
project. Then select the
option Build path ->
congure Build Path.

Step 2

Select libraries
-> Add External Jars.

Step 3

Browse to the location where


Hadoop is installed to add
the Hadoop jar les in the
source code.

www.acadgild.com 06
Step 4

Go to Hadoop -> Share ->


Hadoop.

Step 5

Browse through the


MapReduce directory
& select
hadoop-mapreduceclient-
core-2.6.0.jar le.
Click OK to add that jar le.

Step 6

Similarly, add hadoop-com-


mon-2.6.0.jar from the common
directory of Hadoop and then
click on ok to add that jar.
(Refer to the given screenshot.)

www.acadgild.com 07
Step 7

Click on ok to add
both the Jars.

Step 8

The code becomes error-free


after you add both the Jar
les. The below screenshot
represents this.

Once the errors are removed we need to export this jar le into the Linux File System.

www.acadgild.com 08
4. Making the Jar File and Exporting it to the Linux File System

The combined jar le is error-free now and needs to be exportedinto the


Linux le system.

The jar le created from the source code needs to be executed in Hadoop
environment as it cannot be executed in Eclipse in MapReduce mode.

Follow the below steps to create a jar le from the source code and then
export it to the Linux OS where Hadoop is installed.

Step 1

Right-click on the source le,


then select the Export
option, as shown in given
screenshot

Step 2

Click on ok to add both the Jars.


Give the name of the jar le and
location of the lesystem where you want
to export the jar le. In this case, the name
of the jar le is my_wc.jar and the location
is /home/acadgild/Desktop.

www.acadgild.com 09
Step 3

Click on Next to proceed


to the next step.

Step 4

Click on Next.

www.acadgild.com 10
Step 5

Click on the Browse option


to include the Main Class in
the Jar le.

Step 6

Select the Main Class


name and then click OK.

www.acadgild.com 11
Step 7

Click on Finish to successfully export the jar le.

Step 7

The jar le has now been successfully exported to the specied location.

www.acadgild.com 12
5. Executing MapReduce Program

The exported jar le is present in /home/acadgild/Desktop directory. Lets look


at the steps involved in executing the MapReduce program

Step 1

Create a sample input le, sample_input

www.acadgild.com 13
Step 2

The below le contains many words. We need to count the occurrence of all
these words.

Step 3

Copy the input le to HDFS using the Command hadoop dfs -put sample_input /.

Step 4

Type the below script to execute the Map Reduce program.


The rst argument in this command line execution is the jar le,second is the
input le and third is the output location.

www.acadgild.com 14
Step 5

The MapReduce job execution starts after you enter the script.

Step 6

The MapReduce job is now executed!


We can see a list of job related counters which gives entire detail of the executed job.

www.acadgild.com 15
Step 7

Type the command hadoop dfs -ls /sample_out to see the list of les present
in the output location specied in the Job execution.

Step 8

Type the command hadoop dfs -cat /sample_out/part-r-00000 to view the


word count for the input le.

We hope this EBook has given you


a clear understanding of the topic.

www.acadgild.com 16
Are You Looking For A Great Start To Your Career

Enroll in our programming course


& Boost your career

Keep visiting our site www.acadgild.com/blog for more updates


on Big Data and other technologies.
or write to support@acadgild.com

You might also like