====== Using The HPC Cluster =====
Usage instruction moved here: https://confluence.infn.it/display/TD/6+-+The+HPC+cluster
OLD STUFF:
===== Requesting Access =====
To access the cluster you should first obtain an account at CNAF following the procedure you can find at [[https://www.cnaf.infn.it/en/users-faqs/|this link]].\\
In the application form specify in the "reason" field that you need to access the HPC cluster.\\
Please Specify "Daniele Cesini" as contact person.\\
===== Log to the User Interface =====
Once the CNAF account will be provided, you can login to the bastion host. \\
This is not your user interface!\\ \\
To access the cluster from the bastion log into:\\
ui-hpc.cr.cnaf.infn.it
using the same bastion credentials.
===== Getting Support =====
Information and Support can be asked to:
hpc-support <_at_> lists.cnaf.infn.it
===== Home Directory and extra disk space =====
Your home directory:\\
/home/HPC/
in the user interface is shared among all the cluster nodes.\\
\\ No quotas are currently enforced on the home directories and about only 4TB are available in the /home partition.\\
\\ In the case you need more disk space for data and checkpointing every user can access the following directory:\\
/storage/gpfs_maestro/hpc/user//
which is on a shared gpfs storage. \\
Please, do not leave huge unused files in both home directories and gpfs storage areas. Quotas will be enforced in the near future.
===== The LSF Batch System =====
The cluster is managed and accessible via the LSF (version 9.1.2) batch system.\\
A detailed LSF user guide can be found [[http://www-947.ibm.com/support/entry/portal/documentation_expanded_list/platform_computing/platform_lsf?productContext=1731843200|at this IBM page]].
In the following there is a minimal how-to, describing basic operations needed to properly access the CNAF HPC cluster for various job types.\\
==== Querying the Cluster Status with LSF ====
To obtain on overview of the nodes status:\\
bhosts -w
To obtain the queues status:\\
bqueues
Add the option **"-l"** to obtain detailed information. \\
Currently four queues have been defined:
* **hpc_inf** : Max CPU Time is 128 core days (24hours using 128 cores or 48hours using 64 cores) and Max WallClock Time is 79 hours. \\
* **hpc_short** : Max CPU Time is 32 core days (24hours using 32 cores or 24hours using 16 cores) and Max WallClock Time is 79 hours.\\
* **hpc_gpu:** Max WallClockTime in this queue is 33 hours. ** To be used only for jobs requiring GPUs and using few CPU (see below).**\\
* **hpc_int:** Max WallClockTime in this queue is 2 hours. ** To be used only for interactive jobs (see below).**\\ \\
To obtain nodes load information: \\
lsload
Much more details with: \\
lsload -l
To restrict the lsload query to a numerical fields (i.e. io and r15s) use the **"-I"** option : \\
lsload -I io:r15s
To restrict the query to string fields use the **"-s"** option (i.e. gpu_mode): \\
lsload -s gpu_model0
==== Submitting Single Batch Jobs ====
Single batch jobs can be submitted via the **bsub** command. \\
Use option **"-o"** and **"-e"** to redirect standard output and standard error. \\
Option **"-m"** selects specific nodes if needed. \\ I.e: \\
bsub -o test.out -e test.err /usr/bin/whoami
bsub -o test.out -e test.err -m 'hpc-200-06-05' /bin/hostname
=== Standard Output and Standard Error ===
As previously stated standard output and standard error can be redirected with the "-o" and "-e" of the bsub command. \\
The files generated in this way are available at the end of the job, they are owned by root but can be read and removed by the user. They cannot be edited directly. Should you need to edit them you need to make a copy with "cp".\\
To have real time update files redirect the standard output and error using ">" after the executable name, enclosing in single quotes, i.e.:\\
bsub -o test.out -e test.err '/usr/bin/whoami > std.out 2>&1'
The single quotes is important otherwise the output of the bsub command will be redirected.\\
=== Check the Job Status ===
Job status can be queried with the **bjobs** command. \\
Use the **"-w"** option to get Wide format. Displays job information without truncating fields \\
Use the **"-W"** and **"-l"** option detailed information about the job. \\
Use the job number to get information of a single job. \\
Use option **"-u"** to get information of a single user jobs.
**"-a"** Displays information about jobs in all states, including recently finished jobs. \\
I.e: \\
bjobs -W
bjobs -l
bjobs -a -u
=== Killing Submitted Jobs ===
To kill submitted jobs launch the **bkill** command. I.e. : \\
bkill
==== Submitting MPI Jobs via mpirun.lsf (obsolete)====
Currently only **OpenMPI** jobs have been tested on the HPC cluster. \\
To submit an OpenMPI Job please follow the following steps:\\
* Set the right environment in the **.bashrc** in your User Interface home directory as shown below:(since home directories are shared, this will be automatically set on all the nodes):\\
[cesinihpc@ui-hpc ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
export PATH=/usr/lib64/openmpi/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:$LD_LIBRARY_PATH
* Make sure the **.bash_profile** in your home directory exists. If not, create it:\\
[cesinihpc@ui-hpc ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
* Place your executables in a location accessible by your user: home directory or the gpfs shared area (/storage/gpfs_maestro/hpc/user//) \\ \\
* Create the following wrapper script for your executable :\\
[cesinihpc@ui-hpc ~]$ cat cpmpi_test.sh
#!/bin/sh
#can do initial environment setup here if needed
#export
echo "------------------------------------------------"
/usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/bin/mpirun.lsf env PSM_SHAREDCONTEXTS_MAX=8
**PLEASE NOTE:** **mpirun.lsf** has to be used instead of standard mpirun! \\ \\
**PLEASE NOTE:** **PSM_SHAREDCONTEXTS_MAX=8** has to be used if you are not using whole nodes (i.e. not using a number of mpi processes which is a multiple of 32 with a 32 processors per node, ptile in LSF). If you are using whole nodes you can skip this and your job will use the maximun number of shared contexts available on a node (which is 16). If you are not using whole nodes and do not set the PSM_SHAREDCONTEXTS_MAX variable to a number lower than 16 the next job landing on the same node will probably fail.\\ \\
**PLEASE NOTE:** do not set the number of nodes to be used in the mpirun.lsf command, it will be in the bsub command and will be handled by LSF \\ \\
* **Launch the bsub command in this way:**\\
bsub -q -a openmpi -n 32 -R "span[ptile=16]" -o testmpi.out -e testmpi.err '/usr/share/lsf/local/hpc/bin/cpmpi_test.sh
The option **-R "span[ptile=16]"** selects the process per node that will be used. Max is 32. \\
If you want to select specific nodes you can use the option "-m", i.e. -m 'hpc-200-06-02 hpc-200-06-03 hpc-200-06-04'. \\
==== An MPI submission script ====
To hide the complexity of the syntax of the submission command you can use this {{:strutture:cnaf:clusterhpc:cnaf_launcher_sh.txt| cnaf_launcher.sh script}} script to launch OpenMPI jobs from the User Interface.
\\ Just customize the first lines according to your needs. \\
(Thanks to S.Sinigardi for sharing it)\\
==== Alternative MPI multinode submission ====
It is possible to avoid the usage of mpirun.lsf and dinamically set the mpirun machine file in the following way:
1) Create automatically the machine file to be using in the mpirun:
echo $LSB_HOSTS | awk '{split($0,array," ")} END {for (i in array) printf ("%s\n",array[i])}' | awk '{count[$0]++} END {for (word in count) print word,"slots=" count[word]}' > /home/HPC/username/mymachine.txt
2) Use this command to launch mpirun:
mpirun --machinefile /home/HPC/username/machinefile.txt -x PSM_SHAREDCONTEXTS_MAX=8 -np $LSB_DJOB_NUMPROC /home/HPC/username/executablename
A possible bsub submission is:
bsub -q hpc_inf_SL7 -n 16 -R "span[ptile=8]" -o testmpimy.out -e testmpimy.err /home/HPC/username/run_this_example.sh
where in the run_this_example.sh script you launch the previous commands:
----run_this_example.sh----
#!/bin/bash
echo $LSB_HOSTS | awk '{split($0,array," ")} END {for (i in array) printf ("%s\n",array[i])}' | awk '{count[$0]++} END {for (word in count) print word,"slots=" count[word]}' > /home/HPC/username/mymachine.txt
mpirun --machinefile /home/HPC/username/machinefile.txt -x PSM_SHAREDCONTEXTS_MAX=8 -np $LSB_DJOB_NUMPROC /home/HPC/username/executablename
==== Submitting GPU Jobs ====
* Prepare a job wrapper as the following, setting the needed environment: \\
[cesinihpc@ui-hpc ~]$ cat test_2gpu_lsf.sh
#!/bin/sh
export BASE=/usr/local/cuda-5.5/
export PATH=$BASE/bin:$PATH
export C_INCLUDE_PATH=$BASE/include:$C_INCLUDE_PATH
export CPLUS_INCLUDE_PATH=$BASE/include:$CPLUS_INCLUDE_PATH
export LD_LIBRARY_PATH=$BASE/lib:$BASE/lib64:/usr/local/cuda-5.0/lib64/:$LD_LIBRARY_PATH
#env
#echo "------------------------------------------------"
#now your GPU executable
/home/HPC/cesinihpc/test_2gpu.exe
# if it a GPU and OPENMPI job:
# /usr/share/lsf/9.1/linux2.6-glibc2.3-x86_64/bin/mpirun.lsf env PSM_SHAREDCONTEXTS_MAX=8 /home/HPC/cesinihpc/test_2gpu.exe
# remember to add option "-a openmpi" and "-n in the bsub command
#############
* Submit the job wrapper selecting a GPU enabled node using the "-R" option, i.e: \\
bsub -q hpc_inf -R "select [gpu_model0=='TeslaK20m' && gpu_model1=='TeslaK20m' ] rusage [ngpus_excl_p=2]" -o jtest.out -e jtest.err /home/HPC/cesinihpc/test_2gpu_lsf.sh**
The **-R** option showed in the example selects a node with **two Tesla K20 GPUs**. Customise it according to your requirements. \\ \\
If your job does not use many CPU cores and the site is fully used by CPU-only jobs, to submit a GPU job you can use the **hpc_gpu queue** to access "extra" cores not accessible via the hpc_inf queue. \\
The hpc_gpu queue can use **only 2 cores** and only in the nodes where the GPUs are installed. \\
The hostgroups gpuk20 and gpuk40 have been defined to simplify the submission command. \\
I.e. : \\
bsub -q hpc_gpu -m gpuk40 -R "rusage [ngpus_excl_p=2]" -o jtest.out -e jtest.err /home/HPC/cesinihpc/test_2gpu_lsf.sh
**PLEASE NOTE:** the **"rusage"** directive is important - set it to the number of GPUs you need in the node. \\
LSF will subtract the number of GPUs specified in "rusage" from the amount of available GPUS to other jobs in the node. \\ \\
If your job is also an **OpenMPI job** add the options **-a openmpi** and **-n ** and use the mpirun.lsf launcher in the wrapper as described in the previous section.\\ \\
==== Submitting Interactive Jobs ====
To allow interactive access to the nodes for debugging, testing and compiling purposes, an interactive shell can be opened on the nodes submitting a job with **the option -Is**, i.e.: \\
bsub -q hpc_int -Is /bin/bash
**PLEASE NOTE:** After about two hours you will be logged out! **Do not** use interactive shell to submit real life long jobs . \\
===== Getting Support =====
Information and Support can be asked to: \\
hpc-support <_at_> lists.cnaf.infn.it