cn:csn4:calcolo:suma:eurora_howto
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cn:csn4:calcolo:suma:eurora_howto [2014/02/14 10:24] – [EURORA login] roberto.alfieri@infn.it | cn:csn4:calcolo:suma:eurora_howto [2015/02/07 18:27] (current) – [MIC] roberto.alfieri@infn.it | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | |||
| + | ====== EURORA | ||
| + | |||
| + | [[http:// | ||
| + | |||
| + | ===== EURORA login ===== | ||
| + | |||
| + | * In order to become a CINECA user you have to register yourself on the CINECA UserDB ( https:// | ||
| + | |||
| + | * When you receive username and password you can send an e-mail to superc@cineca.it, | ||
| + | |||
| + | * At the end of the previous step you can access the Eurora front-end | ||
| + | |||
| + | The command to view the usage summary of the accounts associated with your username is the following: | ||
| + | |||
| + | login> saldo -b | ||
| + | ===== Eurora usage ===== | ||
| + | |||
| + | |||
| + | [[http:// | ||
| + | - | ||
| + | [[http:// | ||
| + | - | ||
| + | [[http:// | ||
| + | |||
| + | |||
| + | ==== Architecture ==== | ||
| + | |||
| + | [[http:// | ||
| + | |||
| + | * 64 Computing Nodes : E5-2660 Sandy Bridge, 16 cores each (node001 -> node064) | ||
| + | * CPUspeed: | ||
| + | * Memory: 32GB (node039-044-055-064) or 16GB (others) | ||
| + | * GPUs: 2 NVIDIA K20 per node (32 nodes) | ||
| + | * MICs: 2 MIC (Xeon-Phi 5120D ) per node (60x4 cores), named nodeXXX-mic0 and nodeXXX-mic1(32 nodes) | ||
| + | * Each MIC: 1,053 GHz, 512-bit SIMD (1056 GFlops DP, 2012 GFlops SP peak ), 8GB RAM (352 GB/s peak) | ||
| + | * Infiniband QDR 1.1us MPI lat, 40Gb/s (4x)( 8 Gb/s(1x) - 96 Gb/s (12x) ) | ||
| + | * GPUs and MICs are connected to the host via PCIe (gen 2? 8 GB/s (16x) ) | ||
| + | * Total Peak perf. 150 TFlops | ||
| + | |||
| + | login> pbsnodes -a | egrep ' | ||
| + | ==== Batch scheduler ==== | ||
| + | |||
| + | THe job management facility adopted by CINECA is PBS: | ||
| + | [[http:// | ||
| + | |||
| + | Available Queues: | ||
| + | * debug (max 2 nodes, 1/2 hour) | ||
| + | * parallel (max 44 nodes, 6 hours) | ||
| + | * longpar (max 22 nodes, 24 hours) | ||
| + | |||
| + | Script example (script.pbs) | ||
| + | < | ||
| + | #PBS -q debug | ||
| + | #PBS -l select=2: | ||
| + | #PBS -A INFN_EURORA | ||
| + | ... | ||
| + | </ | ||
| + | |||
| + | Submit your job | ||
| + | qsub script.pbs | ||
| + | Monitor your job | ||
| + | qstat [-u username] | ||
| + | Cancel your job | ||
| + | qdel JOB.id | ||
| + | |||
| + | |||
| + | Interactive example (option -I): | ||
| + | |||
| + | qsub -q debug -l nodes=node021: | ||
| + | > cat $PBS_NODEFILE | ||
| + | > exit | ||
| + | |||
| + | Asking more memory to allow demanding | ||
| + | qsub -q debug -l nodes=node021: | ||
| + | | ||
| + | | ||
| + | ==== Storage ==== | ||
| + | |||
| + | [[http:// | ||
| + | |||
| + | $HOME (/ | ||
| + | $CINECA_SCRATCH | ||
| + | | ||
| + | Use the local command " | ||
| + | |||
| + | cindata | ||
| + | |||
| + | ==== Software Environment ==== | ||
| + | |||
| + | * OS: RedHat CentOS release 6.3, 64 bit | ||
| + | * Compilers, scientific libraries and tools are installed using the **software modules** mechanism. | ||
| + | |||
| + | | ||
| + | |||
| + | NOTE: The MIC system libraries are ditributed through the following shared directories: | ||
| + | |||
| + | * / | ||
| + | * / | ||
| + | * / | ||
| + | ===== Job submission ===== | ||
| + | |||
| + | Basic set of examples for the different programming models (CPU only, CPU+GPU, CPU+MIC) | ||
| + | |||
| + | ==== CPU ==== | ||
| + | |||
| + | Example of PBS file | ||
| + | < | ||
| + | #!/bin/bash | ||
| + | #PBS -l select=2: | ||
| + | #PBS -N d2d_bdir-remote | ||
| + | #PBS -l walltime=00: | ||
| + | #PBS -q debug | ||
| + | #PBS -A CON13_INFN | ||
| + | </ | ||
| + | |||
| + | ==== GPU ==== | ||
| + | |||
| + | http:// | ||
| + | |||
| + | ==Compilation== | ||
| + | |||
| + | * login on one gpu-node using command | ||
| + | |||
| + | qsub -A CON13_INFN -I -l select=1: | ||
| + | |||
| + | * load necessary modules | ||
| + | |||
| + | | ||
| + | | ||
| + | ..... | ||
| + | |||
| + | * compile | ||
| + | * exit | ||
| + | |||
| + | ==Execution== | ||
| + | |||
| + | Example of PBS file | ||
| + | < | ||
| + | #!/bin/bash | ||
| + | #PBS -l select=2: | ||
| + | #PBS -N d2d_bdir-remote | ||
| + | #PBS -l walltime=00: | ||
| + | #PBS -q debug | ||
| + | #PBS -A CON13_INFN | ||
| + | |||
| + | # load required modules | ||
| + | module load gnu | ||
| + | module load cuda | ||
| + | |||
| + | mpirun ..... | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | ==== MIC ==== | ||
| + | |||
| + | [[http:// | ||
| + | - | ||
| + | [[http:// | ||
| + | |||
| + | ==Compilation== | ||
| + | |||
| + | * login on one mic-node using command | ||
| + | |||
| + | qsub -A INFNG_test -I -l select=1: | ||
| + | |||
| + | * load needed modules and set variables | ||
| + | |||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | * compile | ||
| + | * exit | ||
| + | |||
| + | ==Execution on mic-node == | ||
| + | |||
| + | qsub -A CON13_INFN -I -l select=1: | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | == Execution using PBS from front-end == | ||
| + | |||
| + | Example of PBS file | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #PBS -l select=1: | ||
| + | #PBS -l walltime=00: | ||
| + | #PBS -q debug | ||
| + | #PBS -A CON13_INFN | ||
| + | |||
| + | # load required modules | ||
| + | module load intel intelmpi mkl | ||
| + | source $INTEL_HOME/ | ||
| + | export I_MPI_MIC=enable | ||
| + | export MIC0=$(head -n 1 $PBS_NODEFILE | sed " | ||
| + | export MIC1=$(head -n 1 $PBS_NODEFILE | sed " | ||
| + | cd < | ||
| + | |||
| + | export MIC_PATH= | ||
| + | export MIC_PATH=$MIC_PATH:/ | ||
| + | export MIC_PATH=$MIC_PATH:/ | ||
| + | |||
| + | mpirun -genv LD_LIBRARY_PATH $MIC_PATH -host ${MIC0}, | ||
| + | </ | ||
| + | |||
| + | |||
| + | == Network fabrics == | ||
| + | |||
| + | http:// | ||
| + | |||
| + | Network fabrics available for the Intel Xeon Phi coprocessor: | ||
| + | |||
| + | The Intel MPI library tries to automatically use the best available network fabric detected (usually shm for intra-node communication and InfiniBand (dapl, ofa) for inter-node communication). | ||
| + | |||
| + | The default can be changed by setting the I_MPI_FABRICS environment variable to I_MPI_FABRICS=< | ||
| + | |||
| + | The availability is checked in the following order: shm:dapl, shm:ofa, shm:tcp. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | ---- | ||
| + | |||
| + | // 2013/ | ||
| + | |||
| + | |||
