Lecture7 1

Big Data Analytics
Mirza Touseef
RISE
Lecture 7.1 – Install and Configure Hadoop
Objectives
• How To Install and Configure Hadoop on

CentOS/RHEL
Big Data Analytics 2

Introduction
• Hadoop is a free, open-source and Java-based software
framework used for storage and processing of large
datasets on clusters of machines.
• It uses HDFS to store its data and process these data
using MapReduce.
• It is an ecosystem of Big Data tools that are primarily
used for data mining and machine learning.
• It has four major components such as Hadoop
Common, HDFS, YARN, and MapReduce.

Step 1 – Disable SELinux
• To disable SELinux, open the /etc/selinux/config file:
• Change the following line:
• Save the file when you are finished. Next, restart your system to
apply the SELinux changes.

Step 2 – Install Java
• Hadoop is written in Java and supports only Java version 8.
You can install OpenJDK 8 and ant using DNF command as
shown below:
• Once installed, verify the installed version of Java with the

following command:
• You should get the following output:

Step 3 – Create a Hadoop User
It is a good idea to create a separate user to run Hadoop for
security reasons.
• Run the following command to create a new user with
name hadoop:
• Next, set the password for this user with the following
command:
• Provide and confirm the new password as shown below:

Step 4 – Configure SSH Key-
based Authentication
Next, you will need to configure passwordless SSH
authentication for the local system.
• First, change the user to hadoop with the following
command:
• Next, run the following command to generate Public and

Private Key Pairs:

• You will be asked to enter the filename. Just press Enter to
complete the process:

• Next, append the generated public keys from id_rsa.pub
to authorized_keys and set proper permission:
• Next, verify the passwordless SSH authentication with the

following command:

• You will be asked to authenticate hosts by adding RSA keys
to known hosts. Type yes and hit Enter to authenticate the
localhost:

Step 5 – Install Hadoop
• First, change the user to hadoop with the following
command:
• Next, download the latest version of Hadoop using the

wget command:
• Once downloaded, extract the downloaded file:
• Next, rename the extracted directory to hadoop:

Next, you will need to configure Hadoop and Java
Environment Variables on your system.
• Open the ~/.bashrc file in your favorite text editor:
• Append the following lines:

• Save and close the file. Then, activate the environment
variables with the following command:
• Next, open the Hadoop environment variable file:

• doop-env.sh
• Update the JAVA_HOME variable as per your Java

installation path:
• Save and close the file when you are finished.

Step 6 – Configure Hadoop
First, you will need to create the namenode and datanode
directories inside Hadoop home directory:
• Run the following command to create both directories:
• Next, edit the core-site.xml file and update with your

system hostname:
• Change the following name as per your system hostname:

• Save and close the file. Then, edit the hdfs-site.xml file:
• Change the NameNode and DataNode directory path as

shown below:

• Save and close the file. Then, edit the mapred-site.xml file:
• Make the following changes:
• Save and close the file. Then, edit the yarn-site.xml file:

• Make the following changes:
• Save and close the file when you are finished.

Step 7 – Start Hadoop Cluster
Before starting the Hadoop cluster. You will need to format
the Namenode as a hadoop user.
• Run the following command to format the hadoop
Namenode:

• After formating the Namenode, run the following
command to start the hadoop cluster:
• Once the HDFS started successfully, you should get the

following output:
• Next, start the YARN service as shown below:

• You can now check the status of all Hadoop services using
the jps command:
• You should see all the running services in the following
output:

Step 8 – Access Hadoop Namenode
and Resource Manager
• To access the Namenode, open your web browser and
visit the URL http://your-server-ip:9870. You should see
the following screen:

Step 9 – Access Hadoop Namenode
and Resource Manager
• To access the Resource Manage, open your web browser
and visit the URL http://your-server-ip:8088. You should
see the following screen:

Step 10 – Verify the Hadoop
Cluster
At this point, the Hadoop cluster is installed and configured. Next, we will
create some directories in HDFS filesystem to test the Hadoop.
• Let’s create some directory in the HDFS filesystem using
the following command:
• Next, run the following command to list the above

directory:

Step 10 – Verify the Hadoop
Cluster
You can also verify the above directory in the Hadoop Namenode web
interface.
• Go to the Namenode web interface, click on the Utilities => Browse
the file system. You should see your directories which you have
created earlier in the following screen:

Step 11 – Stop Hadoop Cluster
You can also stop the Hadoop Namenode and Yarn service
any time by running the stop-dfs.sh and stop-yarn.sh script
as a Hadoop user.
• To stop the Hadoop Namenode service, run the following
command as a hadoop user:
• To stop the Hadoop Resource Manager service, run the

following command:

Conclusion
• In the above tutorial, you learned how to set up the

Hadoop single node cluster on CentOS 8. I hope you have
now enough knowledge to install the Hadoop in the
production environment.

Lecture7 1

Uploaded by

Copyright:

Available Formats

You might also like

Lecture7 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture7 1

Uploaded by

Copyright:

Available Formats

Big Data Analytics

• How To Install and Configure Hadoop on

Big Data Analytics 2

Big Data Analytics 3

• Change the following line:

Big Data Analytics 4

• Once installed, verify the installed version of Java with the

• You should get the following output:

Big Data Analytics 5

• Provide and confirm the new password as shown below:

Big Data Analytics 6

• Next, run the following command to generate Public and

Big Data Analytics 7

Big Data Analytics 8

• Next, verify the passwordless SSH authentication with the

Big Data Analytics 9

Big Data Analytics 10

• Next, download the latest version of Hadoop using the

• Once downloaded, extract the downloaded file:

• Next, rename the extracted directory to hadoop:

Big Data Analytics 11

• Append the following lines:

Big Data Analytics 12

• Next, open the Hadoop environment variable file:

• Update the JAVA_HOME variable as per your Java

• Save and close the file when you are finished.

Big Data Analytics 13

• Next, edit the core-site.xml file and update with your

Big Data Analytics 14

• Change the NameNode and DataNode directory path as

Big Data Analytics 15

• Make the following changes:

Big Data Analytics 16

• Save and close the file when you are finished.

Big Data Analytics 17

• You should get the following output:

Big Data Analytics 18

• Once the HDFS started successfully, you should get the

• Next, start the YARN service as shown below:

Big Data Analytics 19

Big Data Analytics 20

Big Data Analytics 21

Big Data Analytics 22

• Next, run the following command to list the above

• You should get the following output:

Big Data Analytics 23

Big Data Analytics 24

• To stop the Hadoop Resource Manager service, run the

Big Data Analytics 25

• In the above tutorial, you learned how to set up the

Big Data Analytics 26

You might also like