Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

mika.a.holappa(a)nokia.

com
15-Aug-2019
0p12

1 © 2018 Nokia <Document ID: change ID in footer or remove>


2 © 2018 Nokia <Document ID: change ID in footer or remove>
Cluster and job
• LSF cluster consist of master host, submit hosts (client) and execution hosts. Independent
LSF clusters in :
DATACENTER Host in Grid - Hosts in LSF (24.Jun 2019=>)
• Finland/Oulu : 26(HW renewal) – 169
• USA/Sunnyvale : 0 - 24
• USA/Franklin Park : 1 - 10
• France/Marcoussis : 1 - 30
• China/Hangzhou : 30(HW renewal) – 35

lshosts - view static resource information about hosts in the cluster


bhosts - View resource and job information about server hosts in the cluster
lsid - View the cluster name
lsclusters - View cluster status and size

• A job is a command that is submitted to LSF for execution.

bjobs - View jobs in the system


bsub - Submit jobs to the system

3 © 2018 Nokia <Document ID: change ID in footer or remove>


Job slot and job states
• A job slot is a bucket into which one job is assigned in the LSF system. We have one job
slot for each physical CPU core (hyperthreading is disabled) in *_soc queues. In other
queues one job slot for each hyperthreaded core.
• Typical execution host : 2 processors * 18 cores/processor = 36 cores == 36 job slots

bhosts - View job slot limits for hosts and host groups
bqueues - View job slot limits for queues
busers - View job slot limits for users and user groups

• Job states :
• PEND - Waiting in a queue for scheduling and dispatch
• RUN - Dispatched to a execution host and running
• DONE - Finished normally with zero exit value
• EXIT - Finished with non-zero exit value
• PSUSP - Suspended while pending
• USUSP - Suspended by user
• SSUSP - Suspended by the LSF system
4 © 2018 Nokia <Document ID: change ID in footer or remove>
Queue
• All jobs wait in queues until they are scheduled and dispatched to execution hosts.
• Queues available in the system :
Name Type Purpose
i Interactive (fg) Interactive work with GUI. Open for all users.
b Batch (bg) Background execution, regression runs. Open for all users.
i_soc Interactive (fg) Interactive work with GUI. For SoC users only.
b_soc Batch (bg) Background execution, regression runs. For SoC users only.
i_soc_rh7 Interactive (fg) Same as i_soc but with RHEL7.

• Access to all queues ending with _soc are restricted only to users who belong to
”soc_users” or ”soc_extusers” UNIX group. In LSF it doesn’t have to be the default UNIX
group (gid) like in grid.
• Queue must always be defined in bsub command!

bqueues - View available queues


bsub -q <queuename> - Submit a job to a specific queue
bparams - View default queues

5 © 2018 Nokia <Document ID: change ID in footer or remove>


Hosts
• There is one master host for each LSF cluster, number of submit hosts and number of
execution hosts.
• Master host acts as the overall coordinator (job scheduling and dispatch) for the cluster.

lsid - View the master host name

• Submit host (server:No) is a host (login host) from where you can submit jobs to the LSF
system :

bsub - Submit a job


bjobs - View jobs that are submitted

• Execution host (server:Yes)is a host where job runs.

bjobs - View where a job runs

6 © 2018 Nokia <Document ID: change ID in footer or remove>


Load values
• Static load values (reported by LoadInformationManager during boot time) :
• ncpus - Number of CPU cores
• maxmem - Total available memory
lshosts -l - View static load values
• maxswp - Total available swap
• maxtmp - Total available /tmp diskspace
• Dynamic load values (reported by LIM periodically) :
• status - Host status
• r15s, r1m and r15m - 15s/1min/15min load values (run queue lengths)
• ut - CPU utilization
• pg - paging rate
• ls - number of login sessions
lsload -l - View dynamic load values
• it - interactive idle time
• swp - available swap space
• mem - available memory
• tmp - available /tmp space
• io - disk IO rate

7 © 2018 Nokia <Document ID: change ID in footer or remove>


Load values

Master hosts

Execution hosts
STATIC

Submit hosts
DYNAMIC

8 © 2018 Nokia <Document ID: change ID in footer or remove>


Resources
• The LSF system uses built-in and configured resources to track resource availability and
usage. Jobs are scheduled according to the resources available on individual hosts.

lsinfo - View the resources available in your cluster


bjobs -l - View current resource usage of a job

• Restrict which hosts the job can run on with resource requirements (bsub -R). Hosts
that match the resource requirements are the candidate hosts. When LSF schedules a
job, it collects the load index values of all the candidate hosts and compares them to the
scheduling conditions. Jobs are only dispatched to a host if all load values are within the
scheduling thresholds.

bsub -R - Specify resource requirement string for a job


bsub -q b_soc -R ”rusage[mem=4000]” mycmd - Example

9 © 2018 Nokia <Document ID: change ID in footer or remove>


Job lifecycle
Job
DONE
EXIT

Login host : oulnxc67

oulng300
ouhwlsfm50

10 © 2018 Nokia <Document ID: change ID in footer or remove>


11 © 2018 Nokia <Document ID: change ID in footer or remove>
Setup and basic structure
• Detailed information about LSF can be found from :
• IDM : https://nokia.sharepoint.com/sites/idm => Documents => User guides => LSF
• LSF_guide : This material
• Switches_from_grid_to_lsf : Grid cmd line switches vs LSF cmd line switches
• IBM : https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0
• IT : https://confluence.inside.nsn.com/display/HWEE/IBM+Spectrum+LSF
• LSF environment setup :
• LSF is automatically setup in all proj setups (module load proj/<proj>/<var>).
• LSF can also be manually setup with module load lsf.
• LSF commands can be directly run on login host (host where your VNC server is running).
• DO NOT run terminals in LSF. Instead of doing that run your tools/scripts in LSF with
bsub.
• Jobs can be run in foreground (bsub -Is) or background (bsub). Foreground job typically
is interactive work with GUI. Regression simulations are run as background job.
• LSF command help (see next slide for list of commands) : man <cmd>

12 © 2018 Nokia <Document ID: change ID in footer or remove>


13 © 2018 Nokia <Document ID: change ID in footer or remove>
Setup and basic structure
• Queue must always be defined in bsub command with -q <queue>.
• Currently available queues (B=batch/background,I=interactive/foreground) :
Queue Type Comman Users Purpose
d
i I bsub –Is ALL users Design debug/development work using GUIs in foreground.

b B bsub ALL users Regression runs in background.

i_soc I bsub -Is soc_users Design debug/development work using GUIs in foreground.
soc_extusers
b_soc B bsub soc_users Regression runs in background.
soc_extusers
i_soc_rh7 I bsub -Is soc_users Same as i_soc but with RHEL7.
soc_extusers

• You can specify multiple queues separated by space : -q ”<queue1> <queue2>”

14 © 2018 Nokia <Document ID: change ID in footer or remove>


Starting foreground job
bsub -Is -q i_soc -R "rusage[mem=2000]" verdi (Note : starting verdi EDA tool here)

-Is Submit interactive X-window job with SSH.

-q i_soc Run job in i_soc queue (see previous slides for available queues).
NOTE : Always define the queue in bsub commands!
-R "rusage[mem=2000]" Peak memory need (MB) for the whole duration of the job.

Note:The working directory is by default the same path where you


were when running the bsub command (grid -cwd switch).
Environment variables are automatically inherited (grid -V switch).

15 © Nokia Solutions and Networks 2014


Confidential
Starting background job
bsub -q b_soc -R "rusage[mem=5000]" my_regression_script.csh
-q b_soc Run job in b_soc queue (see previous slides for available queues).
NOTE : Always define the queue in qsub commands!
-R "rusage[mem=5000]" Peak memory (MB) need for the whole duration of the job.

-oo $PWD/bsub_o.log Send std out and std err to file. Use -o /dev/null to disable automatic email
notifications.
-eo $PWD/bsub_e.log Send std err to a different file.

Note:The working directory is by default the same path where you


were when running the bsub command (grid -cwd switch).
Environment variables are automatically inherited (grid -V switch).

16 © Nokia Solutions and Networks 2014


Confidential
Starting background job / script based
bsub my_regression_script.csh
• In this case you feed in the bsub switches inside your script using the special #BSUB
comments in the script header.

• Submit this script to b_soc queue : ”bsub < myscript.csh”

17 © 2018 Nokia <Document ID: change ID in footer or remove>


Memory reservations
Information about how LSF behaves when user’s process tree mem size exceeds the given
rusage[mem=X] value will be added here later…

18 © 2018 Nokia <Document ID: change ID in footer or remove>


Other usefull options for bsub
-J "my job" Give name ”my job” for the job.

-B Send email when the job is started.

-N Send email after the job has ended.

-u mika.a.holappa@nokia.com Defines email address for email notifications. Default is


submitter’s email address.
-S <size in MB> Limit stack mem. size to <size in MB> (do not define if not
absolutely needed, global default is unlimited)
-L /bin/tcsh Initialize execution environment with specified login shell.

-sla bigmem Submit job to specified service class (not implemented yet).

-app quartus Use “quartus” application profile (not implemented yet).

-W [HH:]MM Set run time limit for the job.

-We [HH:]MM Specifies an estimation run time for the job.


19 © Nokia Solutions and Networks 2014
Confidential
Simple sample script

20 © Nokia Solutions and Networks 2014


Confidential
Checking status of running/pending jobs
• List my current jobs : bjobs

• List all jobs for all users of all queues : bjobs -u all (add switch ”-q b_soc” to select one queue)

Note! Add -w to view result in wide format.


In the above example ”mikahol” is actually ”mikahola”!
21 © Nokia Solutions and Networks 2014
Confidential
Checking status of running/pending jobs
• List verbose information about a job : bjobs -l <job-id>

22 © Nokia Solutions and Networks 2014


Confidential
Checking status of running/pending jobs
• View the stdout of running job : bpeek <job-id>

23 © Nokia Solutions and Networks 2014


Confidential
Delete job
• Delete job <job-id> : ”bkill <job-id>”
• If job refuses to die, then follow these steps :
• Find out the execution <host> hostname :

• Make sure you don’t have other jobs running on the same execution host.
• Kill all your processes on that execution host : ”ssh $USER@<host> kill -9 -1”
• You can also selectively kill the processes related to a problem job. You can find out the process id:s with :
”ssh $USER@<host> pstree -lcpGAau $USER”

24 © Nokia Solutions and Networks 2014


Confidential
Analyzing results
• You can create LSF logfiles with -oo <file_out> (and -eo <file_err>) switches.

25 © Nokia Solutions and Networks 2014


Confidential
Analyzing results
• Show information about completed jobs completed (-C) 11.Nov 2018, run in queue
b_soc (-q) by the user mikahola (-u) :

26 © Nokia Solutions and Networks 2014


Confidential
Analyzing results
• Show information about completed job 2926 :

27 © Nokia Solutions and Networks 2014


Confidential
CPU and memory affinity
• Details in :
https://nokia.sharepoint.com/sites/idm/Shared%20Documents/User%20guides/LSF/13%20CPU%20
and%20memory%20affinity%20H0231G14.pdf
• THIS AREA IS STILL UNDER WORK!

All used 4 cores from same host and same socket


(physical CPU).

28 © Nokia Solutions and Networks 2014


Confidential
Copyright and confidentiality

The contents of this document are proprietary and Such Feedback may be used in Nokia products and are made in relation to the accuracy, reliability or
confidential property of Nokia. This document is related specifications or other documentation. contents of this document. NOKIA SHALL NOT BE
provided subject to confidentiality obligations of the Accordingly, if the user of this document gives Nokia RESPONSIBLE IN ANY EVENT FOR ERRORS IN THIS
applicable agreement(s). Feedback on the contents of this document, Nokia DOCUMENT or for any loss of data or income or any
may freely use, disclose, reproduce, license, special, incidental, consequential, indirect or direct
This document is intended for use of Nokia’s distribute and otherwise commercialize the damages howsoever caused, that might arise from
customers and collaborators only for the purpose feedback in any Nokia product, technology, service, the use of this document or any contents of this
for which this document is submitted by Nokia. No specification or other documentation. document.
part of this document may be reproduced or made
available to the public or to any third party in any Nokia operates a policy of ongoing development. This document and the product(s) it describes
form or means without the prior written permission Nokia reserves the right to make changes and are protected by copyright according to the
of Nokia. This document is to be used by properly improvements to any of the products and/or applicable laws.
trained professional personnel. Any use of the services described in this document or withdraw this
contents in this document is limited strictly to the document at any time without prior notice. Nokia is a registered trademark of Nokia
use(s) specifically created in the applicable Corporation. Other product and company names
agreement(s) under which the document is The contents of this document are provided "as is". mentioned herein may be trademarks or trade
submitted. The user of this document may Except as required by applicable law, no warranties names of their respective owners.
voluntarily provide suggestions, comments or other of any kind, either express or implied, including, but
feedback to Nokia in respect of the contents of this not limited to, the implied warranties of
document ("Feedback"). merchantability and fitness for a particular purpose,

30 © 2018 Nokia <Document ID: change ID in footer or remove>

You might also like