Hadoop Installation Step by Step

HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)
Step 1 — Installing Java

First we need to update the package list
 sudo apt-get update
Next, install JDK, the default Java Development Kit on Ubuntu version.
 sudo apt-get install default-jdk
Once the installation is complete, let's check the version.
 java –version
 Output
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
This output verifies that OpenJDK has been successfully installed.
Step 2 — Installing Hadoop

With Java in place, we'll visit the Apache Hadoop Releases page to find the most recent
stable release. Follow the binary for the current release:
Design and modified by Mr. Amitava Choudhury and Ms. Ambika Agarwal, Assistant Professor, UPES.
Ref. www.digitalocean.com
Or simple write the following command on terminal, On the server, we'll use wget to
fetch it:
 wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.9.0/hadoop-
2.9.0.tar.gz
Note: The Apache website will direct you to the best mirror dynamically, so your URL may
not match the URL above.
Now that we've verified that the file wasn't corrupted or changed, we'll use the tar
command with the -xflag to extract, -z to uncompress, -v for verbose output, and -f to
specify that we're extracting from a file. Use tab-completion or substitute the correct
version number in the command below:
tar -xzvf hadoop-2.9.0.tar.gz
Finally, we'll move the extracted files into /usr/local, the appropriate place for locally
installed software. Change the version number, if needed, to match the version you
downloaded.
 sudo mv hadoop-2.9.0 /usr/local/hadoop
With the software in place, we're ready to configure its environment.
Step 3 — Configuring Hadoop's Java Home

Hadoop requires that you set the path to Java, either as an environment variable or in
the Hadoop configuration file.
The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java, which is
in turn a symlink to default Java binary. We will use readlink with the -f flag to follow
every symlink in every part of the path, recursively. Then, we'll use sed to
trim bin/java from the output to give us the correct value for JAVA_HOME.
To find the default Java path
 readlink -f /usr/bin/java | sed "s:bin/java::"
Output
/usr/lib/jvm/java-8-openjdk-amd64/jre/
You can copy this output to set Hadoop's Java home to this specific version, which
ensures that if the default Java changes, this value will not. Alternatively, you can use
the readlink command dynamically in the file so that Hadoop will automatically use
whatever Java version is set as the system default.
To begin, open hadoop-env.sh:
 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Then, choose one of the following options:
Option 1: Set a Static Value

/usr/local/hadoop/etc/hadoop/hadoop-env.sh
. . .
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
Step 4 — Running Hadoop

Now we should be able to run Hadoop:
 /usr/local/hadoop/bin/hadoop
Output
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
CLASSNAME run the class named CLASSNAME
or
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
note: please use "yarn jar" to launch
YARN applications, not this command.
checknative [-a|-h] check native hadoop and compression libraries

availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop

archive
classpath prints the class path needed to get the
credential interact with credential providers
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
trace view and modify Hadoop tracing settings
Most commands print help when invoked w/o parameters.
The help means we've successfully configured Hadoop to run in stand-alone mode Check
Hadoop version. To ensure that Hadoop installed successfully need to check Hadoop
version.
/usr/local/hadoop/bin/hadoop version
If everything alright it will give output
Hadoop 2.9.0
Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git
-r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075
Compiled by kshvachk on 2017-12-16T01:06Z
Compiled with protoc 2.9.0
From source with checksum 9f118f95f47043332d51891e37f736e9
This command was run using
/usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.0.jar
We'll ensure that it is functioning properly by running the example MapReduce program
it ships with. To do so, create a directory called input in our home directory
 mkdir ~/input
Create a text file (eg. Text.txt) inside input folder. Here our file name is text.txt and it
contains some textual information (Hi, his name is Himadri, his favorite place is Dehradun
which is located at Uttarakhand. This information will help you to search hi in all word
containing hi).
Now copy Hadoop's configuration files into input to use those files as our data.
 cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Next, we can use the following command to run the MapReduce hadoop-mapreduce-
examples program, a Java archive with several options. We'll invoke its grep program,
one of many examples included in hadoop-mapreduce-examples, followed by the input
directory, input and the output directory grep_example. The MapReduce grep program
will count the matches of a literal word or regular expression. Finally, we'll supply a regular
expression to find occurrences of the word principal within or at the end of a
declarative sentence. The expression is case-sensitive, so we wouldn't find the word if it
were capitalized at the beginning of a sentence:
/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.9.0.jar grep ~/input/text.txt ~/output 'hi[.]*'
When the task completes, it provides a summary of what has been processed and
errors it has encountered, but this doesn't contain the actual results.
Output
. . .
File System Counters
FILE: Number of bytes read=1184518
FILE: Number of bytes written=2348362
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=11
Map output materialized bytes=19
Input split bytes=114
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=19
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=62
Total committed heap usage (bytes)=336732160
Note: If the output directory already exists, the program will fail, and rather than seeing the
summary, the ouput will look something like:
Output
. . .
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Results are stored in the output directory and can be checked by running cat on the
output directory:
 cat ~/output/*
Output
7 hi
The MapReduce task found one occurrence of the word hi followed by a period and 7
times in the text file hi found. Running the example program has verified that our stand-
alone installation is working properly and that non-privileged users on the system can run
Hadoop for exploration or debugging.

Hadoop Installation Step by Step

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hadoop Installation Step by Step

Uploaded by

Copyright:

Available Formats

HADOOP INSTALLATION STEP BY STEP PROCESS ON LINUX (SINGLE NODE CLUSTER)

Step 1 — Installing Java

 sudo apt-get update

 sudo apt-get install default-jdk

Once the installation is complete, let's check the version.

java version "1.8.0_151"

Java(TM) SE Runtime Environment (build 1.8.0_151-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

This output verifies that OpenJDK has been successfully installed.

Step 2 — Installing Hadoop

 sudo mv hadoop-2.9.0 /usr/local/hadoop

With the software in place, we're ready to configure its environment.

Step 3 — Configuring Hadoop's Java Home

To find the default Java path

 readlink -f /usr/bin/java | sed "s:bin/java::"

To begin, open hadoop-env.sh:

 sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Then, choose one of the following options:

Option 1: Set a Static Value

Step 4 — Running Hadoop

Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

CLASSNAME run the class named CLASSNAME

where COMMAND is one of:

version print the version

jar <jar> run a jar file

note: please use "yarn jar" to launch

YARN applications, not this command.

checknative [-a|-h] check native hadoop and compression libraries

distcp <srcurl> <desturl> copy file or directories recursively

archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop

classpath prints the class path needed to get the

credential interact with credential providers

Hadoop jar and the required libraries

daemonlog get/set the log level for each daemon

trace view and modify Hadoop tracing settings

Most commands print help when invoked w/o parameters.

If everything alright it will give output

Compiled by kshvachk on 2017-12-16T01:06Z

Compiled with protoc 2.9.0

From source with checksum 9f118f95f47043332d51891e37f736e9

You might also like