Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Using a CPU farm

Last Time and Todayy

„ Last time we discussed strings in the context


of PERL.
‰ Other advanced scripting languages have
similarly powerful tools.

„ Today we talk about CPU farms.


farms
‰ To maximise their power we will take advantage
of advanced scripting
scripting.
What is a CPU farm.

„ A CPU ffarm iis a collection


ll ti off processors th
thatt
can be used to process many jobs in parallel.

„ It is not strictlyy speaking


p gpparallel p
processing
g
as the jobs can be carried out in series.

„ CPU farms can allow you to carry out much


more intensive analysis than if possible if you
just has a single CPU.
What does a farm possess.
p

„ A front
f t end
d
‰ This will be the machine that you log onto.
„ Disk
‰ There should be “a lot” of disk space available that you can
access from the front end and the nodes.
„ Many farm nodes.
‰ These are the CPUs that do the work.
‰ They will often also have their own local disk.
„ A network
‰ The nodes will be connected to the front end by a network.
‰ The network capacity can be the limiting factor.
The front end
„ When using g the farm yyou will spend
p most of yyour
time on the front end.
„ Typically this will have the same OS as the nodes.
‰ You can compile code here.
„ You submit jobs from the front end to the nodes.
„ You manage the disk on the front end
„ You might take a quick look at the output here
here.
„ Remember the front end will have many other users,
so try and be as undisruptive as possible
possible.
The nodes
„ The nodes are where your CPU time occurs.
„ Usually they will have local disk.
‰ Using this will cut down on network traffic.
„ Improves farm performance
performance.
‰ Be careful about how much space is available.
„ On some farms the same box may be several
nodes.
‰ Dual CPU machines
‰ Hyperthreading.
„ They will have high memory, but watch out for
programs with
ith very hi
high
h memory usage, th
they may
not play well together.
JJobs on a farm

„ JJobsb on a ffarm may be


b th
thought
ht off a bit lik
like
files on a file system.
„ There are commands that can
‰ list them
‰ delete them
„ Theyy have an owner
„ You cannot
‰ copy them
‰ do (many) operations on them.
An example.
p

„ To illustrate the use of a farm I will use an


example.
„ The CPU farm at RAL PPD
„ Generally you will be able to get an account
here if you need it (hep).
„ The commands are one implementation of
the grid engine.
‰ Most farms use the same commands.
‰ All farms have commands that do the same thing.
g
Submittingg a job
j

„ You use the command qsub

„ This will return a report to the screen, that


includes a unique job id for the job you just
submitted.
‰ Y may need
You d thi
this jjob
b id llater.
t

„ qsub my_job.scr
Listingg the jobs
j submitted to a farm.
„ To list the jjobs use the command q
qstat
„ This will tell you
‰ The job name
‰ The job ID
‰ It’s status
‰ The running time
‰ The owner.
„ Use qstat –u username to see the jobs belonging to
a particular user.
„ There are other useful switches
‰ See the man page.
p g
JJob Status
„ Running
‰ A job that is currently running on a node.
‰ You will be able to see how long it’s running with qstat
„ Queued
‰ A job that is waiting for a free node.
„ Terminated
‰ A job that is finished. You won’t see these in qstat
„ Suspended/Error
‰ Something has happened to the job and it’s in a error state.
„ This is probably your fault.
„ But it could be a system error, so it’s worth restarting these
once.
once
Deletingg a job
j

„ You can use the qdel command

‰ qdel jobid

„ The job id can be obtained from qsub or


qstat.
t t
„ You will onlyy be able to delete yyour own jjobs.
‰ Unless you have manager privileges.
Different queues
q

„ Some farms (like RAL) have different queues.


„ These are to separate
p resources for different
groups (experiments) from the main queue.
„ There may also be a fast queue
‰ A queue with few nodes and a short maximum
duration
‰ Good for testing.
„ The queue is specified in the qsub command.
Maximum duration
„ Some q
queues expect
p a maximum duration.
‰ When exceeded the job will be terminated.
„ Set using qsub
‰ At ral qsub –l cput=24:00:00 myjob.scr
„ The duration you set can control which queue
you use and the resources available.
„ Be careful when setting the maximum
duration try and keep it short
duration, short, but long
enough that your job will finish.
The local disk

„ Use the local disk whenever possible


‰ Copy
y data files to it at the beginning.
g g
‰ Use it for output and temp files.
‰ Copy output to main disk at the end.
„ Take care however not to fill the disk.
„ O many systems
On t the
th llocall di
diskk iis available
il bl
as $TMPDIR
General advice
„ Use a shell script to control your job
job.
‰ Don’t directly submit your executable.
‰ Thi gives
This i you more control.
t l
„ Use an advanced script to control your queue
usage.
‰ Write command files
‰ Write job shell scripts.
‰ Submit jobs
jobs.
„ Don’t submit too many jobs at once.
A typical
yp job
j script.
p

#!/bin/tcsh Setup my environment

source ~/env
/env_script.csh
script.csh
Use the local disk.
cd $TMPDIR
Copy needed data to local disk
cp ~/input_data/my_data .
$SNO_CODE/snoman.exe
_ –c mycmd.cmd
y
Run my analysis code
cp result.ntp ~/output_data/
Copy my results back to
the data disk
JJob Master Script
p
„ You
ou ca
can pu
put much
uc that
a we eddiscussed
scussed in the
e
last two lectures into action.
‰ Writing
t g multiple
u t p e co
command
a d files
es a
and
d sshell
e scscripts.
pts
‰ Running system programs and analysing their
output.
‰ Examining the output of your analysis programs.
„ You can put limits on the number of jobs in
two ways.
‰ Using the sleep command when too many jobs
are submitted.
‰ Usingg a cronjob
j
Cronjobs
j

„ A cronjob
j b iis a jjob
b th
thatt runs att a scheduled
h d l d
time.
„ Your cronjobs are controlled by your crontab.
‰ Not allowed on all systems (including RAL).
„ To edit your crontab use
‰ crontab –e
‰ You will use your $EDITOR variable to decide the
editor
‰ You need to exit the editor for the change to take
effect.
A typical
yp contrab. Redirect output.
p
Otherwise you’ll get
Time to run job an email

# My program
0 * * * * my_program
y_p g > /dev/null 2>> /dev/null
#End of crontab.
Comment to end crontab. Need a newline
at the end of each command
„ Time is specified by five variables.
‰ mhdwm
‰ * is a wild card that means any
y
‰ When the system time equals this time the job will
run.
The GRID

„ The GRID is an extension of single site


farms.
‰ A farm of farms.
„ It will be used extensively in LHC and other
experiments.
„ Th use off this
The thi iis b
beyond
d th
the scope off thi
this
course.
‰ When the time comes to use it you can talk to
your collaborators.
Exercises

„ Adapt your multiple command file script to


control job submission to a farm.

„ Write a program to send you a simple email


email,
and add it to your crontab so it will send it at
midnight.
midnight

You might also like