User Tools

Site Tools


cn:csn4:calcolo:suma:galileo_howto

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cn:csn4:calcolo:suma:galileo_howto [2015/02/20 15:38] – [Batch scheduler] roberto.alfieri@infn.itcn:csn4:calcolo:suma:galileo_howto [2015/02/24 16:58] (current) – [GALILEO login] roberto.alfieri@infn.it
Line 1: Line 1:
  
 +====== GALILEO ======
 +
 +=== NEWS ===
 +
 +2015/02/23  Job accounting starts today. Users must be authorized by the "Iniziativa Specifica" to whom they belong. 
 +
 +2015/01/28 Galileo is open to the INFN users for the "pre-production" tests. The account for this activity is **INFNG_test**.
 +  The MIC accelerators are not available yet.
 +
 +[[http://www.hpc.cineca.it/content/galileo| What is GALILEO]]
 +
 +===== GALILEO login =====
 +
 +  * In order to become a CINECA user you have to register yourself on the CINECA UserDB ( https://userdb.hpc.cineca.it/user ). The procedure will create a new **username** associated to your identity (Skip this step if you already have a Cineca username). 
 +
 +  * Each user must be associated to the Account raleted to the "iniziativa Specifica" (IS); this is needed for the accounting of the consumed budget. Please contact the responsible of your IS in order to be enabled. 
 +
 +   * At the end of the previous step you can access the Galileo front-end  **login.galileo.cineca.it** via ssh or other standard tools  [[http://www.hpc.cineca.it/content/access-systems-0 | Access to the systems]]
 +
 +The following command displays  the accounts associated with your username and the relative usage:
 +
 +  login> saldo -b  
 +  
 +=====  GALILEO usage =====
 +
 +
 +[[http://www.hpc.cineca.it/content/general-information-0 | General Information]]
 +-
 +[[http://www.hpc.cineca.it/content/stay-tuned | Get in touch]]
 +
 +
 +
 +==== Architecture ====
 +
 +[[ http://www.hpc.cineca.it/content/galileo#systemarchitecture | Galileo Architecture]]
 +
 +<code>
 +Model: IBM NeXtScale    -  Architecture: Linux Infiniband Cluster 
 +Nodes: 516 
 +Processors: 8-cores Intel Haswell 2.40 GHz (2 per node)
 +Cores: 16 cores/node, 8256 cores in total
 +Accelerators: 2 Intel Phi 7120p per node on 384 nodes  (768 in total)
 +RAM: 128 GB/node, 8 GB/core
 +Internal Network: Infiniband with 4x QDR switches
 +Disk Space:2,500 TB of local storage
 +Peak Performance: xxx TFlop/s (to be defined)
 +</code>
 +
 +To get on-line details:
 +
 +  login> pbsnodes -a | egrep '(Mom|available.mem|available.cpuspeed|available.nmics|available.ngpus)'
 +  
 +==== Batch scheduler ====
 +
 +THe job management facility adopted by CINECA is PBS:
 +[[http://www.hpc.cineca.it/content/batch-scheduler-pbs-0 | Batch Scheduler PBS  ]]
 +
 +Routing Queue "route": This is the default queue.
 +You have only to declare how many resources you need and your job will be directed into the right queue with a right priority.  
 +Normal parallel jobs will be routed to the "shared" execution queue.    The maximum number of nodes that you can require is 128 with a maximum walltime of 24 hours.
 +
 +
 +Script example (script.pbs)
 +<code>
 +#!/bin/bash
 +#PBS -N prova
 +#PBS -l walltime=02:00:00
 +#PBS -l select=16:ncpus=16:mpiprocs=16
 +#PBS -A INFNG_test
 +#
 +module load intel/cs-xe-2015--binary
 +module load intelmpi/5.0.2--binary
 +cd working_dir
 +mpirun executable
 +</code>
 +
 +Submit your job 
 +   qsub script.pbs
 +Monitor your job
 +   qstat [-u username]
 +Cancel your job
 +   qdel JOB.id
 +
 +
 +Interactive example (option -I):
 +
 +  qsub  -l select=1:ncpus=16  -A INFNG_test -I
 +  > cat $PBS_NODEFILE
 +  > exit
 +
 +== Default values assigned by the queue manager ==
 +
 +  * 1 CPU
 +  * 8GB of memory (each node has 128 GB ram)
 +  * Max Walltime: 30 minutes
 +  * MICs : 0
 +  * MPI processes : 1 per node
 +  * cores allocation: Pack (try to pack requested CPU on smallest number of nodes) 
 +
 +The default walltime is 30 minutes.
 +
 +== More complex requests ==
 +
 +  qsub -A INFNG_test -I  -l ncpus=16,walltime=24:00:00                # ask 16 CPUs and 1 day walltime
 +  qsub -A INFNG_test -I  -l select=2:ncpus=16:mem=120gb               # ask 2 chunks of 16 nodes each (2 whole nodes) 
 +  qsub -A INFNG_test -I  -l select=16:ncpus=1,place=scatter           # Each chunk is allocated to a separate host (default)
 +  qsub -A INFNG_test -I  -l select=16:ncpus=1,place=pack              # All chunks are allocated from vnodes on the same host
 +  qsub -A INFNG_test -I  -l select=2:ncpus=16:mem=124gb:nmics=2       # ask 2 whole node including MICs (16 cores and 124 GB and 2 MICs per node) 
 +  qsub -A INFNG_test -I  -l select=2:ncpus=16:mem=120gb:mpiprocs=1    # PBS_NODEFILE incluedes 1  istance per node (default)  
 +  qsub -A INFNG_test -I  -l select=2:ncpus=16:mem=120gb:mpiprocs=16   # PBS_NODEFILE incluedes 16 istances per node
 +
 +==== Storage ====
 +
 +CINECA documentation: [[ http://www.hpc.cineca.it/content/galileo#disks | Galileo Disks and file system]]
 +
 +
 +  $HOME  (/galileo/home/userexternal/<username>) (permanent/ backuped)   
 +  $CINECA_SCRATCH  (/gpfs/scratch/userexternal/<username>) (temporary)
 +  $WORK   ( /gpfs/work/<YOUR_GROUP_ACCOUNT_AREA> )
 +Use the local command "cindata" to query for disk usage and quota ("cindata -h" for help):
 +
 +  cindata
 +
 +==== Software Environment ====
 +
 +  * OS: RedHat CentOS release 7, 64 bit
 +  * Compilers, scientific libraries and tools are installed using the **software modules** mechanism.
 +
 +CINECA Documentation: [[ http://www.hpc.cineca.it/content/galileo#programming | Programming environment]]
 +-
 +[[ http://www.hpc.cineca.it/content/galileo#compiler | Compilers]]
 +-
 +[[http://www.hpc.cineca.it/content/galileo#debugger | Debuggers and profilers]]
 +
 +
 +===== MIC job submission (Work in progress)  =====
 +
 +[[http://www.hpc.cineca.it/content/quick-guide-intel-mic-usage | CINECA quick guide]]
 +-
 +[[http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en | PRACE best practice guide]]
 +
 +==Compilation==
 +
 +  * login on one mic-node using command
 +
 +     qsub -A INFNG_test -I -l select=1:ncpus=16:nmics=2 # select a whole node with 2 mics
 +
 +  * load needed modules and set variables
 +
 +     module load intel intelmpi mkl
 +     source $INTEL_HOME/bin/compilervars.sh intel64
 +     export I_MPI_MIC=enable
 +
 +   * compile
 +   * exit
 +
 +==Execution on mic-node ==
 +
 +     qsub -A INFNG_test -I -l select=1:ncpus=16:nmics=2 
 +     module load intel
 +     module load intelmpi
 +     source $INTEL_HOME/bin/compilervars.sh intel64
 +     ./exe-offload.x
 + 
 +== Execution using PBS from front-end ==
 +
 +Example of PBS file
 +
 +<code>
 +#!/bin/bash
 +#PBS -l select=1:ncpus=16:nmics=2
 +#PBS -l walltime=00:20:00
 +#PBS -A INFNG_test
 +
 +# load required modules
 +module load intel intelmpi mkl
 +source $INTEL_HOME/bin/compilervars.sh intel64
 +export I_MPI_MIC=enable
 +export MIC0=$(head -n 1 $PBS_NODEFILE | sed  "s/[(DDD).]/$1-mic0./")
 +export MIC1=$(head -n 1 $PBS_NODEFILE | sed  "s/[(DDD).]/$1-mic1./")
 +cd  <workdir>
 +
 +export MIC_PATH=
 +export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/mkl/lib/mic/
 +export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/lib/mic
 +
 +mpirun -genv LD_LIBRARY_PATH $MIC_PATH -host ${MIC0},${MIC1} -perhost 1   ./imb/3.2.4/bin/IMB-MPI1.mic pingpong
 +</code>
 +
 +
 + ----
 + 
 +// 2015/02/23//

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki