TheCampusCluster sp2013

The Campus Cluster
What is the Campus Cluster?

Batch job system
High throughput
High latency
Available resources:
~450 nodes
12 Cores/node
24-96 GB memory
Shared high performance filesystem
High speed multinode message passing
What isnt the Campus Cluster?
Not: Instantly available computation resource
Can wait up to 4 hours for a node

Not: High I/O Friendly
Network disk access can hurt performance

Not: .
Getting Set Up
Getting started
Request an account:
https://campuscluster.illinois.edu/invest/user_form.html

Connecting:
ssh to taub.campuscluster.illinois.edu
Use netid and AD password

Where to put data
Home Directory ~/
Backed up, currently no quota (in future 10s of GB)
Use /scratch for temporary data - ~10TB
Scratch data is currently deleted after ~3 months
Available on all nodes
No backup
/scratch.local - ~100GB
Local to each node, not shared across network
Beware that other users may fill disk
/projects/VisionLanguage/ - ~15TB
Keep things tidy by creating a directory for your netid
Backed up

Current Filesystem best practices (Should improve for Cluster v. 2):
Try to do batch writes to one large file
Avoid many little writes to many little files

Backup = Snapshots
(Just learned this yesterday)
Snapshots taken daily

Not intended for disaster recovery
Stored on same disk as data

Intended for accidental deletes/overwrites, etc.
Backed up data can be accessed at:
/gpfs/ddn_snapshot/.snapshots/<date>/<path>

e.g. recover accidentally deleted file in home directory:
/gpfs/ddn_snapshot/.snapshots/2012-12-
24/home/iendres2/christmas_list
Moving data to/from cluster
Only option right now is sftp/scp

SSHFS lets you mount a directory from remote
machines
Havent tried this, but might be useful
Modules
[iendres2 ~]$ modules load <modulename>

Manages environment, typically used to add
software to path:
To get the latest version of matlab:
[iendres2 ~]$ modules load matlab/7.14
To find modules such as vim, svn:
[iendres2 ~]$ modules avail
Useful Startup Options
Appended to the end of my bashrc:
Make default permissions the same for user and
group, useful when working on a joint project
umask u=rwx,g=rwx
Safer alternative dont allow writing
umask u=rwx,g=rx
Load common modules
module load vim
module load svn
module load matlab

Submitting Jobs
Queues
Primary (VisionLanguage)
Nodes we own (Currently 8)
Jobs can last 72 hours
We have priority access
Secondary (secondary)
Anyone elses idle nodes (~500)
Jobs can only last 4 hours, automatically killed
Not unusual to wait 12 hours for job to begin runing
Scheduler
Typically behaves as first come first serve

Claims of priority scheduling, we dont know
how it works
Types of job
Batch job
No graphics, runs and completes without user
interaction

Interactive Jobs
Brings remote shell to your terminal
X-forwarding available for graphics

Both wait in queue the same way
Scheduling jobs
Batch job
[iendres2 ~]$ qsub <job_script>
job_script defines parameters of job and the actual
command to run
Details on job scripts to follow
Interactive Jobs
[iendres2 ~]$ qsub -q <queuename> -I -l
walltime=00:30:00,nodes=1:ppn=12
Include X for X-forwarding
Details on l parameters to follow

Configuring Jobs
Basics
Parameters of jobs are defined by a bash
script which contains PBS commands
followed by script to execute

#PBS -q VisionLanguage
#PBS -l nodes=1:ppn=12
#PBS -l walltime=04:00:00

cd ~/workdir/
echo This is job number ${PBS_JOBID}

Basics


cd ~/workdir/

Queue to use:
VisionLanguage or
secondary
Basics


cd ~/workdir/

Number of nodes 1, unless using MPI
or other distributed programming
Processors per node Always 12,
smallest computation unit is a physical
node, which has 12 cores (with current
hardware)*
*Some queues are configured to allow
multiple concurrent jobs per node, but
this is uncommon
Basics


cd ~/workdir/

Maximum time job will run for it is
killed if it exceeds this
72:00:00 hours for primary queue
04:00:00 hours for secondary queue
Basics


cd ~/workdir/

Bash comands are allowed anywhere in
the script and will be executed on the
scheduled worker node after all PBS
commands are handled
Basics


cd ~/workdir/

There are some reserved variables that the scheduler
will fill in once the job is scheduled (see `man qsub` for
more variables)
Basics
Scheduler variables (From manpage)

PBS_O_HOST
the name of the host upon which the qsub command is running.

PBS_SERVER
the hostname of the pbs_server which qsub submits the job to.

PBS_O_QUEUE
the name of the original queue to which the job was submitted.

PBS_O_WORKDIR
the absolute path of the current working directory of the qsub command.

PBS_ARRAYID
each member of a job array is assigned a unique identifier (see -t)

PBS_ENVIRONMENT
set to PBS_BATCH to indicate the job is a batch job, or to PBS_INTERACTIVE to indicate the job is a PBS interac-
tive job, see -I option.

PBS_JOBID
the job identifier assigned to the job by the batch system.

PBS_JOBNAME
the job name supplied by the user.

PBS_NODEFILE
the name of the file contain the list of nodes assigned to the job (for parallel and cluster systems).

PBS_QUEUE
the name of the queue from which the job is executed.

There are some reserved variables that the scheduler
will fill in once the job is scheduled (see `man qsub` for
more variables)
Monitoring Jobs
[iendres2 ~]$ qstat
Sample output:
JOBID JOBNAME USER WALLTIME STATE QUEUE
333885[].taubm1 r-afm-average hzheng8 0 Q secondary
333899.taubm1 test6 lee263 03:33:33 R secondary
333900.taubm1 cgfb-a dcyang2 09:22:44 R secondary
333901.taubm1 cgfb-b dcyang2 09:31:14 R secondary
333902.taubm1 cgfb-c dcyang2 09:28:28 R secondary
333903.taubm1 cgfb-d dcyang2 09:12:44 R secondary
333904.taubm1 cgfb-e dcyang2 09:27:45 R secondary
333905.taubm1 cgfb-f dcyang2 09:30:55 R secondary
333906.taubm1 cgfb-g dcyang2 09:06:51 R secondary
333907.taubm1 cgfb-h dcyang2 09:01:07 R secondary
333908.taubm1 ...conp5_38.namd harpole2 0 H cse
333914.taubm1 ktao3.kpt.12 chandini 03:05:36 C secondary
333915.taubm1 ktao3.kpt.14 chandini 03:32:26 R secondary
333916.taubm1 joblammps daoud2 03:57:06 R cse

States:
Q Queued, waiting to run
R Running
H Held, by user or admin, wont run until released (see qhold, qrls)
C Closed finished running
E Error this usually doesnt happen, indicates a problem with the cluster
grep is your friend for finding specific jobs
(e.g. qstat u iendres2 | grep R gives all of
my running jobs)
Managing Jobs
qalter, qdel, qhold, qmove, qmsg,
qrerun, qrls, qselect, qsig, qstat

Each takes a jobid + some arguments
Problem: I want to run the same job
with multiple parameters

cd ~/workdir/
./script <param1> <param2>
Solution: Create wrapper script to iterate over params
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

Problem 2: I cant pass parameters into
my job script

cd ~/workdir/
./script <param1> <param2>
Solution 2: Hack it!
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

my job script
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

We can pass parameters via the
jobname, and delimit them using
the - character (or whatever
you want)

# Pass parameters via jobname:
export IFS="-"
i=1

for word in ${PBS_JOBNAME}; do
echo $word
arr[i]=$word
((i++))
done

# Stuff to execute
echo Jobname: ${arr[1]}
cd ~/workdir/
echo ${arr[2]} ${arr[3]}
my job script
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

qsub N job-param1-param2 job_script

qsubs -N parameter sets the job
name

export IFS="-"
i=1

echo $word
arr[i]=$word
((i++))
done

# Stuff to execute
cd ~/workdir/
my job script

export IFS="-"
i=1

echo $word
arr[i]=$word
((i++))
done

# Stuff to execute
cd ~/workdir/
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

qsub N job-param1-param2 job_script

Output would be:

Jobname: job
param1 param2
Problem: I want to run the same job
with multiple parameters

export IFS="-"
i=1

echo $word
arr[i]=$word
((i++))
done

# Stuff to execute
cd ~/workdir/
Where:
param1 = {a, b, c}
param2 = {1, 2, 3}

#!/bin/bash

param1=({a,b,c})
param2=({1,2,3}) # or {1..3}
for p1 in ${param1[@]}; do
for p2 in ${param2[@]}; do
qsub N job-${p1}-${p2} job_script
done
done

Now Loop!
Problem 3: My job isnt multithreaded,
but needs to run many times

cd ~/workdir/
./script ${idx}
Solution: Run 12 independent
processes on the same node so 11
CPUs dont sit idle
Problem 3: My job isnt multithreaded,
but needs to run many times

cd ~/workdir/
# Run 12 jobs in the background
for idx in {1..12}; do
./script ${idx} & # Your job goes here (keep the ampersand)
pid[idx]=$! # Record the PID
done

# Wait for all the processes to finish
for idx in {1..12}; do
echo waiting on ${pid[idx]}
wait ${pid[idx]}
done

Solution: Run 12 independent
processes on the same node so 11
CPUs dont sit idle
Matlab and The Cluster
Simple Matlab Sample

cd ~/workdir/
matlab -nodisplay -r matlab_func(); exit;
Matlab Sample: Passing Parameters

cd ~/workdir/
param = 1
param2 = \string\ # Escape string parameters
matlab -nodisplay -r matlab_func(${param}); exit;

cd ~/workdir/
matlab -nodisplay -r matlab_func(); exit;
Simple Matlab Sample
Running more than a few matlab jobs
(thinking about using the secondary
queue) ?
You may use too many licenses -
especially Distributed Computing
Toolbox (e.g. parfor)
Compiling Matlab Code
Doesnt use any matlab licenses once compiled
Compiles matlab code into a standalone executable
Constraints:
Code cant call addpath
Functions called by eval, str2func, or other implicit methods must be
explicitly identified
e.g. for eval(do_this) to work, must also include %#function do_this

To compile (within matlab):
>> addpath(everything that should be included)
>> mcc m function_to_compile.m

isdeployed() is useful for modifying behavior for compiled applications
(returns true if code is running the compiled version)
Running Compiled Matlab Code
Requires Matlab compiler runtime
>> mcrinstaller % This will point you to the installer and help install it
% make note of the installed path MCRPATH (e.g. /mcr/v716/)
Compiled code generates two files:
function_to_compile and run_function_to_compile.sh

To run:
*iendres2 ~+$ ./run_function_to_compile.sh MCRPATH param1 param2 paramk
Params will be passed into matlab function as usual, except they will always be strings
Useful trick:
function function_to_compile(param1, param2, , paramk)
if(isdeployed)
param1 = str2num(param1);
%param2 expects a string
paramk = str2num(paramk);
end

Parallel For Loops on the Cluster
Not designed for multiple nodes on shared
filesystem:
Race condition from concurrent writes to:
~/.matlab/local_scheduler_data/

Easy fix: redirect directory to /scratch.local
1. Setup (done once, before submitting jobs):
[iendres2 ~]$ ln sv
/scratch.local/tmp/USER/matlab/local_scheduler_data
~/.matlab/local_scheduler_data
(Replace USER with your netid)

2. Wrap matlabpool function to make sure tmp data exists:

function matlabpool_robust(varargin)

if(matlabpool('size')>0)
matlabpool close
end

% make sure the directories exist and are empty for good measure
system('rm -rf /scratch.local/tmp/USER/matlab/local_scheduler_data');

system(sprintf('mkdir -p
/scratch.local/tmp/USER/matlab/local_scheduler_data/R%s', version('-
release')));

% Run it:
matlabpool (varargin{:});
Warning:
/scratch.local may get filled up by other users, in which case this
will fail.
Best Practices
Interactive Sessions
Dont leave idle sessions open, it ties up the nodes
Job arrays
Still working on kinks in the scheduler, I managed
to kill the whole cluster
Disk I/O
Minimize I/O for best performance
Avoid small reads and writes due to metadata
overhead
Maintenance
Preventive maintenance (PM) on the cluster is
generally scheduled on a monthly basis on the third
Wednesday of each month from 8 a.m. to 8 p.m.
Central Time. The cluster will be returned to service
earlier if maintenance is completed before schedule.
Resources
Beginners guide:
https://campuscluster.illinois.edu/user_info/doc/beginner.html
More comprehensive users guide:
http://campuscluster.illinois.edu/user_info/doc/index.html
Cluster Monitor:
http://clustat.ncsa.illinois.edu/taub/
Simple sample job scripts
/projects/consult/pbs/
Forum
https://campuscluster.illinois.edu/forum/

TheCampusCluster sp2013

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TheCampusCluster sp2013

Uploaded by

Copyright:

Available Formats

The Campus Cluster

What is the Campus Cluster?

You might also like