====== EURORA ====== [[http://www.cineca.it/en/content/eurora | What is EURORA]] ===== EURORA login ===== * In order to become a CINECA user you have to register yourself on the CINECA UserDB ( https://userdb.hpc.cineca.it/user ). The procedure will create a new **username** associated to your identity (Skip this step if you already have a Cineca username). * When you receive username and password you can send an e-mail to superc@cineca.it, requesting to be enabled to the AURORA system in the framework of the INFN-CINECA agreement. CINECA will associate your username to the **CON13_INFN** account (every account has a budget and a set of usernames associated with it). http://www.hpc.cineca.it/content/accounting-0 * At the end of the previous step you can access the Eurora front-end **login.eurora.cineca.it** via ssh or other standard tools [[http://www.hpc.cineca.it/content/access-systems-0 | Access to the systems]] The command to view the usage summary of the accounts associated with your username is the following: login> saldo -b ===== Eurora usage ===== [[http://www.hpc.cineca.it/content/general-information-0 | General Information]] - [[http://www.hpc.cineca.it/content/stay-tuned | Get in touch]] - [[http://www.hpc.cineca.it/content/eurora-user-guide | Eurora User guide ]] ==== Architecture ==== [[http://www.hpc.cineca.it/hardware/eurora | Eurora Architecture]] * 64 Computing Nodes : E5-2660 Sandy Bridge, 16 cores each (node001 -> node064) * CPUspeed: 2 GHz (node001 -> node032) or 3GHz (node033 -> node064) * Memory: 32GB (node039-044-055-064) or 16GB (others) * GPUs: 2 NVIDIA K20 per node (32 nodes) * MICs: 2 MIC (Xeon-Phi 5120D ) per node (60x4 cores), named nodeXXX-mic0 and nodeXXX-mic1(32 nodes) * Each MIC: 1,053 GHz, 512-bit SIMD (1056 GFlops DP, 2012 GFlops SP peak ), 8GB RAM (352 GB/s peak) * Infiniband QDR 1.1us MPI lat, 40Gb/s (4x)( 8 Gb/s(1x) - 96 Gb/s (12x) ) * GPUs and MICs are connected to the host via PCIe (gen 2? 8 GB/s (16x) ) * Total Peak perf. 150 TFlops login> pbsnodes -a | egrep '(Mom|available.mem|available.cpuspeed|available.nmics|available.ngpus)' ==== Batch scheduler ==== THe job management facility adopted by CINECA is PBS: [[http://www.hpc.cineca.it/content/batch-scheduler-pbs-0 | Batch Scheduler PBS ]] Available Queues: * debug (max 2 nodes, 1/2 hour) * parallel (max 44 nodes, 6 hours) * longpar (max 22 nodes, 24 hours) Script example (script.pbs) #PBS -q debug #PBS -l select=2:ncpus=16:mem=15GB:cpuspeed=3GHz #PBS -A INFN_EURORA ... Submit your job qsub script.pbs Monitor your job qstat [-u username] Cancel your job qdel JOB.id Interactive example (option -I): qsub -q debug -l nodes=node021:ncpus=1 -A CON13_INFN -I > cat $PBS_NODEFILE > exit Asking more memory to allow demanding compilations qsub -q debug -l nodes=node021:ncpus=16:mem=15gb -A CON13_INFN -I ==== Storage ==== [[http://www.hpc.cineca.it/content/data-storage-and-filesystems-0 | Data storage and file systems]] $HOME (/eurora/home/userexternal/) (permanent/ backuped) $CINECA_SCRATCH (/gpfs/scratch/userexternal/) (temporary) Use the local command "cindata" to query for disk usage and quota ("cindata -h" for help): cindata ==== Software Environment ==== * OS: RedHat CentOS release 6.3, 64 bit * Compilers, scientific libraries and tools are installed using the **software modules** mechanism. http://www.hpc.cineca.it/content/eurora-user-guide#programming NOTE: The MIC system libraries are ditributed through the following shared directories: * /cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic * /cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic * /cineca/prod/compilers/intel/cs-xe-2013/binary/impi/4.1.1.036/mic/lib ===== Job submission ===== Basic set of examples for the different programming models (CPU only, CPU+GPU, CPU+MIC) ==== CPU ==== Example of PBS file #!/bin/bash #PBS -l select=2:mpiprocs=2:ncpus=16:mem=15GB:cpuspeed=3GHz #PBS -N d2d_bdir-remote #PBS -l walltime=00:10:00 #PBS -q debug #PBS -A CON13_INFN ==== GPU ==== http://www.hpc.cineca.it/content/gpgpu-general-purpose-graphics-processing-unit ==Compilation== * login on one gpu-node using command qsub -A CON13_INFN -I -l select=1:ncpus=16:ngpus=2 -q debug * load necessary modules module load gnu/4.6.3 module load cuda/5.0.35 ..... * compile * exit ==Execution== Example of PBS file #!/bin/bash #PBS -l select=2:mpiprocs=2:ncpus=16:ngpus=2 #PBS -N d2d_bdir-remote #PBS -l walltime=00:10:00 #PBS -q debug #PBS -A CON13_INFN # load required modules module load gnu module load cuda mpirun ..... ==== MIC ==== [[http://www.hpc.cineca.it/content/quick-guide-intel-mic-usage | CINECA quick guide]] - [[http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en | PRACE best practice guide]] ==Compilation== * login on one mic-node using command qsub -A INFNG_test -I -l select=1:ncpus=16:nmics=1 * load needed modules and set variables module load intel intelmpi mkl source $INTEL_HOME/bin/compilervars.sh intel64 export I_MPI_MIC=enable * compile * exit ==Execution on mic-node == qsub -A CON13_INFN -I -l select=1:ncpus=16:nmics=2 -q debug module load intel module load intelmpi source $INTEL_HOME/bin/compilervars.sh intel64 ./exe-offload.x == Execution using PBS from front-end == Example of PBS file #!/bin/bash #PBS -l select=1:ncpus=16:nmics=2 #PBS -l walltime=00:20:00 #PBS -q debug #PBS -A CON13_INFN # load required modules module load intel intelmpi mkl source $INTEL_HOME/bin/compilervars.sh intel64 export I_MPI_MIC=enable export MIC0=$(head -n 1 $PBS_NODEFILE | sed "s/[(DDD).]/$1-mic0./") export MIC1=$(head -n 1 $PBS_NODEFILE | sed "s/[(DDD).]/$1-mic1./") cd export MIC_PATH= export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/mkl/lib/mic/ export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/lib/mic mpirun -genv LD_LIBRARY_PATH $MIC_PATH -host ${MIC0},${MIC1} -perhost 1 ./imb/3.2.4/bin/IMB-MPI1.mic pingpong == Network fabrics == http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en#id-1.7.3 Network fabrics available for the Intel Xeon Phi coprocessor: ** shm, tcp, ofa, dapl** The Intel MPI library tries to automatically use the best available network fabric detected (usually shm for intra-node communication and InfiniBand (dapl, ofa) for inter-node communication). The default can be changed by setting the I_MPI_FABRICS environment variable to I_MPI_FABRICS= or I_MPI_FABRICS=:. The availability is checked in the following order: shm:dapl, shm:ofa, shm:tcp. ---- // 2013/08/28//