Using The HPC Cluster

Usage instruction moved here: https://confluence.infn.it/display/TD/6+-+The+HPC+cluster

OLD STUFF:

Requesting Access

To access the cluster you should first obtain an account at CNAF following the procedure you can find at this link.
In the application form specify in the "reason" field that you need to access the HPC cluster.
Please Specify "Daniele Cesini" as contact person.

Log to the User Interface

Once the CNAF account will be provided, you can login to the bastion host.
This is not your user interface!

To access the cluster from the bastion log into:

ui-hpc.cr.cnaf.infn.it

using the same bastion credentials.

Getting Support

Information and Support can be asked to:

hpc-support <_at_> lists.cnaf.infn.it

Home Directory and extra disk space

Your home directory:

/home/HPC/<your_username/

in the user interface is shared among all the cluster nodes.

No quotas are currently enforced on the home directories and about only 4TB are available in the /home partition.

In the case you need more disk space for data and checkpointing every user can access the following directory:

/storage/gpfs_maestro/hpc/user/<your_username>/

which is on a shared gpfs storage.

Please, do not leave huge unused files in both home directories and gpfs storage areas. Quotas will be enforced in the near future.

The LSF Batch System

The cluster is managed and accessible via the LSF (version 9.1.2) batch system.
A detailed LSF user guide can be found at this IBM page. In the following there is a minimal how-to, describing basic operations needed to properly access the CNAF HPC cluster for various job types.

Querying the Cluster Status with LSF

To obtain on overview of the nodes status:

bhosts -w

To obtain the queues status:

bqueues

Add the option "-l" to obtain detailed information.

Currently four queues have been defined:

hpc_inf : Max CPU Time is 128 core days (24hours using 128 cores or 48hours using 64 cores) and Max WallClock Time is 79 hours.
hpc_short : Max CPU Time is 32 core days (24hours using 32 cores or 24hours using 16 cores) and Max WallClock Time is 79 hours.
hpc_gpu: Max WallClockTime in this queue is 33 hours. To be used only for jobs requiring GPUs and using few CPU (see below).
hpc_int: Max WallClockTime in this queue is 2 hours. To be used only for interactive jobs (see below).

To obtain nodes load information:

lsload

Much more details with:

lsload -l

To restrict the lsload query to a numerical fields (i.e. io and r15s) use the "-I" option :

lsload -I io:r15s

To restrict the query to string fields use the "-s" option (i.e. gpu_mode):

lsload -s gpu_model0

Submitting Single Batch Jobs

Single batch jobs can be submitted via the bsub command.
Use option "-o" and "-e" to redirect standard output and standard error.
Option "-m" selects specific nodes if needed.
I.e:

bsub -o test.out -e test.err /usr/bin/whoami
bsub -o test.out -e test.err -m 'hpc-200-06-05' /bin/hostname

Standard Output and Standard Error

As previously stated standard output and standard error can be redirected with the "-o" and "-e" of the bsub command.
The files generated in this way are available at the end of the job, they are owned by root but can be read and removed by the user. They cannot be edited directly. Should you need to edit them you need to make a copy with "cp".

To have real time update files redirect the standard output and error using ">" after the executable name, enclosing in single quotes, i.e.:

bsub -o test.out -e test.err '/usr/bin/whoami > std.out 2>&1'

The single quotes is important otherwise the output of the bsub command will be redirected.

Check the Job Status

Job status can be queried with the bjobs command.
Use the "-w" option to get Wide format. Displays job information without truncating fields
Use the "-W" and "-l" option detailed information about the job.
Use the job number to get information of a single job.
Use option "-u" to get information of a single user jobs. "-a" Displays information about jobs in all states, including recently finished jobs.
I.e:

bjobs -W 
bjobs -l <JOBID>
bjobs -a -u <USERNAME>

Killing Submitted Jobs

To kill submitted jobs launch the bkill command. I.e. :

bkill <JOBID>

Submitting MPI Jobs via mpirun.lsf (obsolete)

Currently only OpenMPI jobs have been tested on the HPC cluster.
To submit an OpenMPI Job please follow the following steps:

Set the right environment in the .bashrc in your User Interface home directory as shown below:(since home directories are shared, this will be automatically set on all the nodes):

[cesinihpc@ui-hpc ~]$ cat .bashrc

# .bashrc 
# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH

Make sure the .bash_profile in your home directory exists. If not, create it:

[cesinihpc@ui-hpc ~]$ cat .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

Place your executables in a location accessible by your user: home directory or the gpfs shared area (/storage/gpfs_maestro/hpc/user/<your_username>/)
Create the following wrapper script for your executable :

[cesinihpc@ui-hpc ~]$ cat cpmpi_test.sh
#!/bin/sh

#can do initial environment setup here if needed
#export <something if needed>
echo "------------------------------------------------"

/usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/bin/mpirun.lsf env PSM_SHAREDCONTEXTS_MAX=8 <PATH_TO_YOUR_EXECUTABLE>

PLEASE NOTE: mpirun.lsf has to be used instead of standard mpirun!

PLEASE NOTE: PSM_SHAREDCONTEXTS_MAX=8 has to be used if you are not using whole nodes (i.e. not using a number of mpi processes which is a multiple of 32 with a 32 processors per node, ptile in LSF). If you are using whole nodes you can skip this and your job will use the maximun number of shared contexts available on a node (which is 16). If you are not using whole nodes and do not set the PSM_SHAREDCONTEXTS_MAX variable to a number lower than 16 the next job landing on the same node will probably fail.

PLEASE NOTE: do not set the number of nodes to be used in the mpirun.lsf command, it will be in the bsub command and will be handled by LSF

Launch the bsub command in this way:

bsub -q <queue_name>  -a openmpi -n 32 -R "span[ptile=16]" -o testmpi.out -e testmpi.err '/usr/share/lsf/local/hpc/bin/cpmpi_test.sh

The option -R "span[ptile=16]" selects the process per node that will be used. Max is 32.

If you want to select specific nodes you can use the option "-m", i.e. -m 'hpc-200-06-02 hpc-200-06-03 hpc-200-06-04'.

An MPI submission script

To hide the complexity of the syntax of the submission command you can use this cnaf_launcher.sh script script to launch OpenMPI jobs from the User Interface.
Just customize the first lines according to your needs.
(Thanks to S.Sinigardi for sharing it)

Alternative MPI multinode submission

It is possible to avoid the usage of mpirun.lsf and dinamically set the mpirun machine file in the following way:

1) Create automatically the machine file to be using in the mpirun:

  echo $LSB_HOSTS | awk '{split($0,array," ")} END {for (i in array) printf ("%s\n",array[i])}' | awk '{count[$0]++} END {for (word in count) print word,"slots=" count[word]}' > /home/HPC/username/mymachine.txt

2) Use this command to launch mpirun:

  mpirun --machinefile /home/HPC/username/machinefile.txt -x PSM_SHAREDCONTEXTS_MAX=8 -np $LSB_DJOB_NUMPROC /home/HPC/username/executablename

A possible bsub submission is:

  bsub -q hpc_inf_SL7  -n 16 -R "span[ptile=8]" -o testmpimy.out -e testmpimy.err /home/HPC/username/run_this_example.sh

where in the run_this_example.sh script you launch the previous commands:

—-run_this_example.sh—-

  #!/bin/bash

  echo $LSB_HOSTS | awk '{split($0,array," ")} END {for (i in array) printf ("%s\n",array[i])}' | awk '{count[$0]++} END {for (word in count) print word,"slots=" count[word]}' > /home/HPC/username/mymachine.txt

  mpirun --machinefile /home/HPC/username/machinefile.txt -x PSM_SHAREDCONTEXTS_MAX=8 -np $LSB_DJOB_NUMPROC /home/HPC/username/executablename

Submitting GPU Jobs

Prepare a job wrapper as the following, setting the needed environment:

[cesinihpc@ui-hpc ~]$ cat test_2gpu_lsf.sh
#!/bin/sh
export BASE=/usr/local/cuda-5.5/
export PATH=$BASE/bin:$PATH 
export C_INCLUDE_PATH=$BASE/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=$BASE/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=$BASE/lib:$BASE/lib64:/usr/local/cuda-5.0/lib64/:$LD_LIBRARY_PATH

#env
#echo "------------------------------------------------"
#now your GPU executable
/home/HPC/cesinihpc/test_2gpu.exe
# if it a GPU and OPENMPI job:
# /usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/bin/mpirun.lsf env PSM_SHAREDCONTEXTS_MAX=8 /home/HPC/cesinihpc/test_2gpu.exe
# remember to add option "-a openmpi" and "-n <NP> in the bsub command
#############

Submit the job wrapper selecting a GPU enabled node using the "-R" option, i.e:

bsub -q hpc_inf -R "select [gpu_model0=='TeslaK20m' && gpu_model1=='TeslaK20m' ] rusage [ngpus_excl_p=2]" -o jtest.out -e jtest.err /home/HPC/cesinihpc/test_2gpu_lsf.sh**

The -R option showed in the example selects a node with two Tesla K20 GPUs. Customise it according to your requirements.

If your job does not use many CPU cores and the site is fully used by CPU-only jobs, to submit a GPU job you can use the hpc_gpu queue to access "extra" cores not accessible via the hpc_inf queue.
The hpc_gpu queue can use only 2 cores and only in the nodes where the GPUs are installed.
The hostgroups gpuk20 and gpuk40 have been defined to simplify the submission command.
I.e. :

bsub -q hpc_gpu  -m gpuk40 -R "rusage [ngpus_excl_p=2]" -o jtest.out -e jtest.err /home/HPC/cesinihpc/test_2gpu_lsf.sh

PLEASE NOTE: the "rusage" directive is important - set it to the number of GPUs you need in the node.
LSF will subtract the number of GPUs specified in "rusage" from the amount of available GPUS to other jobs in the node.

If your job is also an OpenMPI job add the options -a openmpi and -n <NP> and use the mpirun.lsf launcher in the wrapper as described in the previous section.

Submitting Interactive Jobs

To allow interactive access to the nodes for debugging, testing and compiling purposes, an interactive shell can be opened on the nodes submitting a job with the option -Is, i.e.:

bsub -q hpc_int -Is /bin/bash

PLEASE NOTE: After about two hours you will be logged out! Do not use interactive shell to submit real life long jobs .

Getting Support

Information and Support can be asked to:

hpc-support <_at_> lists.cnaf.infn.it

INFN wiki

Table of Contents

Using The HPC Cluster

Requesting Access

Log to the User Interface

Getting Support

Home Directory and extra disk space

The LSF Batch System

Querying the Cluster Status with LSF

Submitting Single Batch Jobs

Standard Output and Standard Error

Check the Job Status

Killing Submitted Jobs

Submitting MPI Jobs via mpirun.lsf (obsolete)

An MPI submission script

Alternative MPI multinode submission

Submitting GPU Jobs

Submitting Interactive Jobs

Getting Support