Develop & Execute Your First MapReduce Program PDF

A Step-by-Step Guide to
Develop & Execute

Your First
MapReduce Program
LEARN. DO. EARN

A Step-by-Step Guide to Develop & Execute
your First MapReduce Program
This EBook is a step by step guide to develop and execute your rst
Map Reduce program in Hadoop. Before we start with our rst program,
lets look at some of the pre-requisites that are required.
1. Pre-requisite
01 This eBook assumes that you 02 The single-node Hadoop

are familiar with key Big Data cluster should be ready.
denitions and terminologies. This means that all the
If not, please download and go daemons should be running.
through our rst eBook in For a step-by-step guide on
Big Data series Understanding installing Hadoop cluster,
Big Data. please refer to this eBook.
The single-node Hadoop cluster should be ready. This means that all the
03
daemons should be running. Refer to the below screenshot for this:
04 Eclipse should be installed in Centos.
www.acadgild.com 02
2. Step-by-Step Guide to Execute MapReduce Program
Follow the below mentioned steps to execute Map Reduce program:
2.1 Write the MapReduce Program in Eclipse
Before we go ahead with writing MapReduce program let us have a brief

understanding about Eclipse.
Eclipse is an IDE for developing Java applications. We will be writing

MapReduce codes in Eclipse and then export it as a jar le in Hadoop
environment.
Follow the below mentioned steps to make a jar le and export it into
Hadoop environment.
Step 1
Open Eclipse and click

File -> New -> Java project.
Step 2
Write the name of the

project and then click
Finish.
www.acadgild.com 03
Step 3
Right-click on the source of

the project. A pop-up menu
will appear.
Click on New -> Select class.
Step 4
Write the Main Class name.

Check the box for Main Class
to include the Main Class,
as shown in the given
screenshot.
You can go through the link given below to understand how to

write your rst MapReduce program in Hadoop in depth.
https://drive.google.com/le/d/0Bxr27gVaXO5s
RExfdndnaWN6WUk/view? usp=sharing
www.acadgild.com 04
After writing the code in Eclipse, you will encounter many errors as Eclipse does not
have the required Hadoop libraries.
Refer to the below screenshot where errors are highlighted with yellow colour:
3. Removing the Errors
Follow the steps mentioned below to remove errors during the compile time
by adding the two jars les in Hadoop directory.
Adding the two jar les will add the required references for the Hadoop
related classes and all the errors will get removed.
Follow the below steps to add the external libraries.
www.acadgild.com 05
Step 1
Right-click on the Class name

from the created Java
project. Then select the
option Build path ->
congure Build Path.
Step 2
Select libraries
-> Add External Jars.
Step 3
Browse to the location where

Hadoop is installed to add
the Hadoop jar les in the
source code.
www.acadgild.com 06
Step 4
Go to Hadoop -> Share ->

Hadoop.
Step 5
Browse through the

MapReduce directory
& select
hadoop-mapreduceclient-
core-2.6.0.jar le.
Click OK to add that jar le.
Step 6
Similarly, add hadoop-com-

mon-2.6.0.jar from the common
directory of Hadoop and then
click on ok to add that jar.
(Refer to the given screenshot.)
www.acadgild.com 07
Step 7
Click on ok to add
both the Jars.
Step 8
The code becomes error-free

after you add both the Jar
les. The below screenshot
represents this.
Once the errors are removed we need to export this jar le into the Linux File System.
www.acadgild.com 08
4. Making the Jar File and Exporting it to the Linux File System
The combined jar le is error-free now and needs to be exportedinto the

Linux le system.
The jar le created from the source code needs to be executed in Hadoop
environment as it cannot be executed in Eclipse in MapReduce mode.
Follow the below steps to create a jar le from the source code and then
export it to the Linux OS where Hadoop is installed.
Step 1
Right-click on the source le,

then select the Export
option, as shown in given
screenshot
Step 2
Click on ok to add both the Jars.

Give the name of the jar le and
location of the lesystem where you want
to export the jar le. In this case, the name
of the jar le is my_wc.jar and the location
is /home/acadgild/Desktop.
www.acadgild.com 09
Step 3
Click on Next to proceed

to the next step.
Step 4
Click on Next.
www.acadgild.com 10
Step 5
Click on the Browse option

to include the Main Class in
the Jar le.
Step 6
Select the Main Class

name and then click OK.
www.acadgild.com 11
Step 7
Click on Finish to successfully export the jar le.
Step 7
The jar le has now been successfully exported to the specied location.
www.acadgild.com 12
5. Executing MapReduce Program
The exported jar le is present in /home/acadgild/Desktop directory. Lets look

at the steps involved in executing the MapReduce program
Step 1
Create a sample input le, sample_input
www.acadgild.com 13
Step 2
The below le contains many words. We need to count the occurrence of all
these words.
Step 3
Copy the input le to HDFS using the Command hadoop dfs -put sample_input /.
Step 4
Type the below script to execute the Map Reduce program.

The rst argument in this command line execution is the jar le,second is the
input le and third is the output location.
www.acadgild.com 14
Step 5
The MapReduce job execution starts after you enter the script.
Step 6
The MapReduce job is now executed!

We can see a list of job related counters which gives entire detail of the executed job.
www.acadgild.com 15
Step 7
Type the command hadoop dfs -ls /sample_out to see the list of les present
in the output location specied in the Job execution.
Step 8
Type the command hadoop dfs -cat /sample_out/part-r-00000 to view the

word count for the input le.
We hope this EBook has given you

a clear understanding of the topic.
www.acadgild.com 16
Are You Looking For A Great Start To Your Career
Enroll in our programming course

& Boost your career
Keep visiting our site www.acadgild.com/blog for more updates

on Big Data and other technologies.
or write to support@acadgild.com

Develop & Execute Your First MapReduce Program PDF

Uploaded by

Copyright:

Available Formats

You might also like

Develop & Execute Your First MapReduce Program PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Develop & Execute Your First MapReduce Program PDF

Uploaded by

Copyright:

Available Formats

A Step-by-Step Guide to

Develop & Execute

LEARN. DO. EARN

01 This eBook assumes that you 02 The single-node Hadoop

04 Eclipse should be installed in Centos.

Follow the below mentioned steps to execute Map Reduce program:

2.1 Write the MapReduce Program in Eclipse

Before we go ahead with writing MapReduce program let us have a brief

Eclipse is an IDE for developing Java applications. We will be writing

Open Eclipse and click

Write the name of the

Right-click on the source of

Write the Main Class name.

You can go through the link given below to understand how to

3. Removing the Errors

Follow the below steps to add the external libraries.

Right-click on the Class name

Browse to the location where

Go to Hadoop -> Share ->

Browse through the

Similarly, add hadoop-com-

The code becomes error-free

The combined jar le is error-free now and needs to be exportedinto the

Right-click on the source le,

Click on ok to add both the Jars.

Click on Next to proceed

Click on the Browse option

Select the Main Class

Click on Finish to successfully export the jar le.

The exported jar le is present in /home/acadgild/Desktop directory. Lets look

Create a sample input le, sample_input

Type the below script to execute the Map Reduce program.

The MapReduce job is now executed!

Type the command hadoop dfs -cat /sample_out/part-r-00000 to view the

We hope this EBook has given you

Enroll in our programming course

Keep visiting our site www.acadgild.com/blog for more updates

You might also like