Running An MPI Cluster Within A LAN MPI Tutorial

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

MPI Tutorial
   
Tutorials
   
Recommended Books
   
About

Running an MPI Cluster within a LAN


Author: Dwaraka Nath

Earlier, we looked at running MPI programs in a single machine to parallel


process the code, taking advantage of having more than a single core in CPU.
Now, let’s widen our scope a bit, taking the same from more than just one
computer to a network of nodes connected together in a Local Area Network. To
keep things simple, let’s just consider two computers for now. It is fairly straight to
implement the same with many more nodes.

As with other tutorials, I am assuming you run Linux machines. The following
tutorial was tested with Ubuntu, but it should be the same with any other
distribution. And also, let’s consider your machine to be manager and the other
one as worker

Pre-requisite
If you have not installed MPICH2 in each of the machines, follow the steps here.

Step 1: Configure your hosts file


You are gonna need to communicate between the computers and you don’t want
to type in the IP addresses every so often. Instead, you can give a name to the
various nodes in the network that you wish to communicate with. hosts file is
used by your device operating system to map hostnames to IP addresses.

$ cat /etc/hosts

127.0.0.1 localhost

172.50.88.34 worker

The worker here is the machine you’d like to do your computation with.
Likewise, do the same about manager in the worker.

Step 2: Create a new user


Though you can operate your cluster with your existing user account, I’d
recommend you to create a new one to keep our configurations simple. Let us
create a new user mpiuser . Create new user accounts with the same username
in all the machines to keep things simple.

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 1/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

$ sudo adduser mpiuser

Follow prompts and you will be good. Please don’t use useradd command to
create a new user as that doesn’t create a separate home for new users.

Step 3: Setting up SSH


Your machines are gonna be talking over the network via SSH and share data via
NFS, about which we’ll talk a little later.

$ sudo apt­
-get install openssh-server

And right after that, login with your newly created account

$ su - mpiuser

Since the ssh server is already installed, you must be able to login to other
machines by ssh username@hostname , at which you will be prompted to enter
the password of the username . To enable more easier login, we generate keys
and copy them to other machines’ list of authorized_keys .

$ ssh-keygen -t dsa

You can as well generate RSA keys. But again, it is totally up to you. If you want
more security, go with RSA. Else, DSA should do just fine. Now, add the
generated key to each of the other computers. In our case, the worker machine.

$ ssh-copy-id worker #ip-address may also be used

Do the above step for each of the worker machines and your own user
(localhost).

This will setup openssh-server for you to securely communicate with the
worker machines. ssh all machines once, so they get added to your list of
known_hosts . This is a very simple but essential step failing which
passwordless ssh will be a trouble.

Now, to enable passwordless ssh,

$ eval `ssh-agent`

$ ssh-add ~/.ssh/id_dsa

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 2/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

Now, assuming you’ve properly added your keys to other machines, you must be
able to login to other machines without any password prompt.

$ ssh worker

Note - Since I’ve assumed that you’ve created mpiuser as the


common user account in all of the worker machines, this should
just work fine. If you’ve created user accounts with different names
in manager and worker machines, you’ll need to work around that.

Step 4: Setting up NFS


You share a directory via NFS in manager which the worker mounts to
exchange data.

NFS-Server
Install the required packages by

$ sudo apt-get install nfs-kernel-server

Now, (assuming you are still logged into mpiuser ), let’s create a folder by the
name cloud that we will share across in the network.

$ mkdir cloud

To export the cloud directory, you create an entry in /etc/exports

$ cat /etc/exports

/home/mpiuser/cloud *(rw,sync,no_root_squash,no_subtree_check)

Here, instead of * you can specifically give out the IP address to which you
want to share this folder to. But, this will just make our job easier.

rw: This is to enable both read and write option. ro is for read-only.
sync: This applies changes to the shared directory only after changes are
committed.
no_subtree_check: This option prevents the subtree checking. When a
shared directory is the subdirectory of a larger filesystem, nfs performs
scans of every directory above it, in order to verify its permissions and
details. Disabling the subtree check may increase the reliability of NFS, but
reduce security.
no_root_squash: This allows root account to connect to the folder.
https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 3/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

Thanks to Digital Ocean for help with tutorial and explanations.


Content re-used on account of Creative Commons Attribution-
NonCommercial-ShareAlike 4.0 International License. For
information, read here.

After you have made the entry, run the following.

$ exportfs -a

Run the above command, every time you make a change to /etc/exports .

If required, restart the nfs server

$ sudo service nfs-kernel-server restart

NFS-worker
Install the required packages

$ sudo apt-get install nfs-common

Create a directory in the worker’s machine with the samename cloud

$ mkdir cloud

And now, mount the shared directory like

$ sudo mount -t nfs manager:/home/mpiuser/cloud ~/cloud

To check the mounted directories,

$ df -h

Filesystem Size Used Avail Use% Mounted on

manager:/home/mpiuser/cloud 49G 15G 32G 32% /home/mpiuser/cloud

To make the mount permanent so you don’t have to manually mount the shared
directory everytime you do a system reboot, you can create an entry in your file
systems table - i.e., /etc/fstab file like this:

$ cat /etc/fstab

#MPI CLUSTER SETUP

manager:/home/mpiuser/cloud /home/mpiuser/cloud nfs

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 4/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

Step 5: Running MPI programs


For consideration sake, let’s just take a sample program, that comes along with
MPICH2 installation package mpich2/examples/cpi . We shall take this
executable and try to run it parallely.

Or if you want to compile your own code, the name of which let’s say is
mpi_sample.c , you will compile it the way given below, to generate an
executable mpi_sample .

$ mpicc -o mpi_sample mpi_sample.c

First copy your executable into the shared directory cloud or better yet, compile
your code within the NFS shared directory.

$ cd cloud/

$ pwd

/home/mpiuser/cloud

To run it only in your machine, you do

$ mpirun -np 2 ./cpi # No. of processes = 2

Now, to run it within a cluster,

$ mpirun -np 5 -hosts worker,localhost ./cpi

#hostnames can also be substituted with ip addresses.

Or specify the same in a hostfile and

$ mpirun -np 5 --hostfile mpi_file ./cpi

This should spin up your program in all of the machines that your manager is
connected to.

Common errors and tips


Make sure all the machines you are trying to run the executable on, has the
same version of MPI. Recommended is MPICH2.
The hosts file of manager should contain the local network IP address
entries of manager and all of the worker nodes. For each of the workers,

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 5/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

you need to have the IP address entry of manager and the corresponding
worker node.

For e.g. a sample hostfile entry of a manager node can be,

$ cat /etc/hosts

127.0.0.1 localhost

#127.0.1.1 1944

#MPI CLUSTER SETUP

172.50.88.22 manager

172.50.88.56 worker1

172.50.88.34 worker2

172.50.88.54 worker3

172.50.88.60 worker4

172.50.88.46 worker5

A sample hostfile entry of worker3 node can be,

$ cat /etc/hosts

127.0.0.1 localhost

#127.0.1.1 1947

#MPI CLUSTER SETUP

172.50.88.22 manager

172.50.88.54 worker3

Whenever you try to run a process parallely using MPI, you can either run
the process locally or run it as a combination of local and remote nodes.
You cannot invoke a process only on other nodes.

To make this more clear, from manager node, this script can be invoked.

$ mpirun -np 10 --hosts manager ./cpi

# To run the program only on the same manager node

So can this be. The following will also run perfectly.

$ mpirun -np 10 --hosts manager,worker1,worker2 ./cpi

# To run the program on manager and worker nodes.

But, the following is not correct and will result in an error if invoked from
manager .

$ mpirun -np 10 --hosts worker1 ./cpi

# Trying to run the program only on remote worker

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 6/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

So, what’s next?


Exciting isn’t it, for having built a cluster to run your code? You now need to know
the specifics of writing a program that can run parallely. Best place to start off
would be the lesson MPI hello world lesson. Or if you want to replicate the same
using Amazon EC2 instances, I suggest you have a look at building and running
your own cluster on Amazon EC2. For all the other lessons, you may go to the
MPI tutorials page.

Should you have any issues in setting up your local cluster, please don’t hesitate
to comment below so we can try to sort it out.

Want to contribute?
This site is hosted entirely on GitHub. This site is no longer being actively
contributed to by the original author (Wes Kendall), but it was placed on GitHub
in the hopes that others would write high-quality MPI tutorials. Click here for more
information about how you can contribute.

ALSO ON MPI TUTORIAL

Running an MPI MPI Send and Receive Performing Parall


Cluster within a LAN … · MPI Tutorial Rank with MPI · M
6 years ago • 29 comments 7 years ago • 5 comments 7 years ago • 1 commen

Running an MPI Cluster MPI Send and Receive Performing Parallel R


within a LAN Earlier, we Sending and receiving are with MPI In the previo
looked at running MPI … the two foundational … lesson, we went over

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 7/8
1/1/22, 7:56 PM Running an MPI Cluster within a LAN · MPI Tutorial

5 Comments MPI Tutorial 🔒 


1 Login

 Favorite t Tweet f Share Sort by Best

Join the discussion…

LOG IN WITH

OR SIGN UP WITH DISQUS ?

Name

Harɩsh • 9 months ago


$ mpirun -np 5 -hosts worker,localhost ./cpi

command should be -host instead of -hosts.


△ ▽ • Reply • Share ›

Heidar Ahangari • a year ago


Hi

Is it possible to cluster systems connected by LAN for Abaqus in


windows OS?
△ ▽ • Reply • Share ›

Ali Rojas • a year ago


@Dwaraka Nath, I tried with de DSA key, but it remains asking
for password. Then I probed with a RSA key I finally it stopped
asking for it. I will be grateful for your commentaries.
△ ▽ • Reply • Share ›

Ali Rojas • a year ago


The key should be created and distributed just by the manager?
or workers have to do it too?
△ ▽ 1 • Reply • Share ›

© 2021 MPI Tutorial. All rights reserved.

https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/ 8/8

You might also like