Professional Documents
Culture Documents
How To Set Up A Hadoop Cluster in Docker
How To Set Up A Hadoop Cluster in Docker
How To Set Up A Hadoop Cluster in Docker
Automatically keep your work and documentation completely in sync. Learn about Docs. ➡ ✕
Apache Hadoop is a popular big data framework that is being used a lot in the
software industry. As a distributed system, Hadoop runs on clusters ranging
from one single node to thousands of nodes.
If you want to test out Hadoop, or don’t currently have access to a big Hadoop
cluster network, you can set up a Hadoop cluster on your own computer,
using Docker. Docker is a popular independent software container platform
that allows you to build and ship your applications, along with all its
environments, libraries and dependencies in containers. The containers are
portable, so you can set up the exact same system on another machine by
running some simple Docker commands. Thanks to Docker, it’s easy to build,
share and run your application anywhere, without having to depend on the
current operating system configuration.
For example, if you have a laptop that is running Windows but need to set up
an application that only runs on Linux, thanks to Docker, you don’t need to
install a new OS or set up a virtual machine. You can set up a Docker
container containing all the libraries you need and delete it the moment you
are done with your work.
Skip to main content
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 1/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
In this tutorial, we will set up a 3-node Hadoop cluster using Docker and run
the classic Hadoop Word Count program to test the system.
1. Setting up Docker
If you don’t already have Docker installed, you could install it easily following
the instructions on the official Docker homepage.
To check the version of your Docker Engine, Machine and Compose, use the
following commands:
$ docker --version
$ docker-compose --version
$ docker-machine –version
If it’s your first time running Docker, test to make sure things are working
properly by launching your first Dockerized web server:
Since this is the first time you will run this command, and the image is not yet
available offline, Docker will pull it from the Docker Hub library. After
everything is finished, visit http://localhost to view the homepage of your new
server.
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 2/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
$ docker-compose up -d
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 3/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
Docker-Compose will try to pull the images from the Docker-Hub library if
the images are not available locally, build the images and start the
containers. After it finishes, you can use this command to check for currently
running containers:
$ docker ps
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 4/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 5/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
Now we can test the Hadoop cluster by running the classic WordCount
program.
First, we will create some simple input text files to feed that into the
WordCount program:
$ mkdir input
$ echo "Hello World" >input/f1.txt
$ echo "Hello Docker" >input/f2.txt
To put the input files to all the datanodes on HDFS, use this command:
Download the example Word Count program from this link (Here I’m
downloading it to my Documents folder, which is the parent directory of my
docker-hadoop folder.
Now we need to copy the WordCount program from our local machine to our
Docker namenode container.
$ docker container ls
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 6/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
Copy the container ID of your namenode in the first column and use it in the
following command to start copying the jar file to your Docker Hadoop
cluster:
$ docker cp ../hadoop-mapreduce-examples-2.7.1-sources.j
Now you are ready to run the WordCount program from inside namenode:
World 1
Docker 1
Hello 2
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 7/10
06/02/2023, 11:19 How to set up a Hadoop cluster in Docker
To safely shut down the cluster and remove containers, use this command:
$ docker-compose down
https://shortcut.com/developer-how-to/how-to-set-up-a-hadoop-cluster-in-docker 8/10