−Table of Contents

EURORA
- EURORA login
- Eurora usage
- Job submission
  - CPU
  - GPU
  - MIC

EURORA

EURORA login

In order to become a CINECA user you have to register yourself on the CINECA UserDB ( https://userdb.hpc.cineca.it/user ). The procedure will create a new username associated to your identity (Skip this step if you already have a Cineca username).

When you receive username and password you can send an e-mail to superc@cineca.it, requesting to be enabled to the AURORA system in the framework of the INFN-CINECA agreement. CINECA will associate your username to the CON13_INFN account (every account has a budget and a set of usernames associated with it). http://www.hpc.cineca.it/content/accounting-0

At the end of the previous step you can access the Eurora front-end login.eurora.cineca.it via ssh or other standard tools Access to the systems

The command to view the usage summary of the accounts associated with your username is the following:

login> saldo -b

Eurora usage

General Information - Get in touch - Eurora User guide

Architecture

Eurora Architecture

64 Computing Nodes : E5-2660 Sandy Bridge, 16 cores each (node001 → node064)
- CPUspeed: 2 GHz (node001 → node032) or 3GHz (node033 → node064)
- Memory: 32GB (node039-044-055-064) or 16GB (others)
- GPUs: 2 NVIDIA K20 per node (32 nodes)
- MICs: 2 MIC (Xeon-Phi 5120D ) per node (60x4 cores), named nodeXXX-mic0 and nodeXXX-mic1(32 nodes)
  - Each MIC: 1,053 GHz, 512-bit SIMD (1056 GFlops DP, 2012 GFlops SP peak ), 8GB RAM (352 GB/s peak)
- Infiniband QDR 1.1us MPI lat, 40Gb/s (4x)( 8 Gb/s(1x) - 96 Gb/s (12x) )
- GPUs and MICs are connected to the host via PCIe (gen 2? 8 GB/s (16x) )
- Total Peak perf. 150 TFlops

login> pbsnodes -a | egrep '(Mom|available.mem|available.cpuspeed|available.nmics|available.ngpus)'

Batch scheduler

THe job management facility adopted by CINECA is PBS: Batch Scheduler PBS

Available Queues:

debug (max 2 nodes, 1/2 hour)
parallel (max 44 nodes, 6 hours)
longpar (max 22 nodes, 24 hours)

Script example (script.pbs)

  #PBS -q debug
  #PBS -l select=2:ncpus=16:mem=15GB:cpuspeed=3GHz
  #PBS -A INFN_EURORA
  ...

Submit your job

 qsub script.pbs

Monitor your job

 qstat [-u username]

Cancel your job

 qdel JOB.id

Interactive example (option -I):

qsub -q debug -l nodes=node021:ncpus=1    -A CON13_INFN -I
> cat $PBS_NODEFILE
> exit

Asking more memory to allow demanding compilations

qsub -q debug -l nodes=node021:ncpus=16:mem=15gb  -A CON13_INFN -I

Storage

Data storage and file systems

$HOME  (/eurora/home/userexternal/<username>) (permanent/ backuped)   
$CINECA_SCRATCH  (/gpfs/scratch/userexternal/<username>) (temporary)

Use the local command "cindata" to query for disk usage and quota ("cindata -h" for help):

cindata

Software Environment

OS: RedHat CentOS release 6.3, 64 bit
Compilers, scientific libraries and tools are installed using the software modules mechanism.

http://www.hpc.cineca.it/content/eurora-user-guide#programming

NOTE: The MIC system libraries are ditributed through the following shared directories:

/cineca/prod/compilers/intel/cs-xe-2013/binary/lib/mic
/cineca/prod/compilers/intel/cs-xe-2013/binary/mkl/lib/mic
/cineca/prod/compilers/intel/cs-xe-2013/binary/impi/4.1.1.036/mic/lib

Job submission

Basic set of examples for the different programming models (CPU only, CPU+GPU, CPU+MIC)

CPU

Example of PBS file

#!/bin/bash
#PBS -l select=2:mpiprocs=2:ncpus=16:mem=15GB:cpuspeed=3GHz
#PBS -N d2d_bdir-remote
#PBS -l walltime=00:10:00
#PBS -q debug
#PBS -A CON13_INFN

GPU

http://www.hpc.cineca.it/content/gpgpu-general-purpose-graphics-processing-unit

Compilation

login on one gpu-node using command

   qsub -A CON13_INFN -I -l select=1:ncpus=16:ngpus=2 -q debug

load necessary modules

   module load gnu/4.6.3
   module load cuda/5.0.35
   .....

compile
exit

Execution

Example of PBS file

#!/bin/bash
#PBS -l select=2:mpiprocs=2:ncpus=16:ngpus=2
#PBS -N d2d_bdir-remote
#PBS -l walltime=00:10:00
#PBS -q debug
#PBS -A CON13_INFN

# load required modules
module load gnu
module load cuda

mpirun .....

MIC

CINECA quick guide - PRACE best practice guide

Compilation

login on one mic-node using command

   qsub -A INFNG_test -I -l select=1:ncpus=16:nmics=1

load needed modules and set variables

   module load intel intelmpi mkl
   source $INTEL_HOME/bin/compilervars.sh intel64
   export I_MPI_MIC=enable

compile
exit

Execution on mic-node

   qsub -A CON13_INFN -I -l select=1:ncpus=16:nmics=2 -q debug
   module load intel
   module load intelmpi
   source $INTEL_HOME/bin/compilervars.sh intel64
   ./exe-offload.x

Execution using PBS from front-end

Example of PBS file

#!/bin/bash
#PBS -l select=1:ncpus=16:nmics=2
#PBS -l walltime=00:20:00
#PBS -q debug
#PBS -A CON13_INFN

# load required modules
module load intel intelmpi mkl
source $INTEL_HOME/bin/compilervars.sh intel64
export I_MPI_MIC=enable
export MIC0=$(head -n 1 $PBS_NODEFILE | sed  "s/[(DDD).]/$1-mic0./")
export MIC1=$(head -n 1 $PBS_NODEFILE | sed  "s/[(DDD).]/$1-mic1./")
cd  <workdir>

export MIC_PATH=
export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/mkl/lib/mic/
export MIC_PATH=$MIC_PATH:/eurora/prod/compilers/intel/cs-xe-2013/binary/composer_xe_2013/lib/mic

mpirun -genv LD_LIBRARY_PATH $MIC_PATH -host ${MIC0},${MIC1} -perhost 1   ./imb/3.2.4/bin/IMB-MPI1.mic pingpong

Network fabrics

http://www.prace-ri.eu/Best-Practice-Guide-Intel-Xeon-Phi-HTML?lang=en#id-1.7.3

Network fabrics available for the Intel Xeon Phi coprocessor: shm, tcp, ofa, dapl

The Intel MPI library tries to automatically use the best available network fabric detected (usually shm for intra-node communication and InfiniBand (dapl, ofa) for inter-node communication).

The default can be changed by setting the I_MPI_FABRICS environment variable to I_MPI_FABRICS=<fabric> or I_MPI_FABRICS=<intra-node fabric>:<inter-nodes fabric>.

The availability is checked in the following order: shm:dapl, shm:ofa, shm:tcp.

2013/08/28