A-Cluster: Unterschied zwischen den Versionen

Aus IT Physics
Wechseln zu: Navigation, Suche
(Login)
K (Queueing system: Slurm: limit of 30 cores only on CPUs queue)
(24 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt)
Zeile 1: Zeile 1:
Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090), purchased by Ana Vila Verde and Christopher Stein
+
Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090) and 2×251TiB disk storage, purchased by Ana Vila Verde and Christopher Stein
  
 
= Login =
 
= Login =
  
External address is 134.91.59.31, internal hostname is <code>stor2</code>.
+
External Hostname is <code>a-cluster.physik.uni-due.de</code> (134.91.59.16), internal hostname is <code>stor2</code>.
 +
 
 +
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] =
 +
 
 +
* There are two queues (''partitions'' in Slurm terminology) named:
 +
** ''CPUs'', the default
 +
** ''GPUs'', to be selected via <code>-p GPUs</code> for jobs which involve a GPU
 +
* In the ''CPUs'' queue, 2 cores stay reserved on each node for GPU jobs, resulting in 30 available cores.
 +
* <code>[https://slurm.schedmd.com/sinfo.html sinfo]</code> displays the cluster's total load.
 +
* <code>[https://slurm.schedmd.com/squeue.html squeue]</code> shows running jobs. You can modify its output via the option <code>-o</code>. To make that permanent put something like <code>alias squeue='squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %C %o"'</code> into your <code>.bashrc</code>.
 +
* In the most simple cases, jobs are submitted via <code>[https://slurm.schedmd.com/sbatch.html sbatch] -n</code> ''n'' ''script-name''. The number ''n'' of CPUs is available within the script as <code>$SLURM_NTASKS</code>. It's not necessary to pass it on to <code>mpirun</code>, since the latter evaluates it on its own, anyway.
 +
* To allocate GPUs as well, add <code>-G </code>''n'' or <code>--gpus=</code>''n'' with ''n'' ∈ {1,2}. You can specify the type as well by prepending <code>rtx2080:</code> or <code>rtx3090:</code> to ''n''.
 +
* Don't use background jobs (<code>&</code>), unless you <code>wait</code> for them before the end of the script.
 +
* <code>[https://slurm.schedmd.com/srun.html srun]</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.)
 +
* For an interactive shell with ''n'' reserved cores on a compute node: <code>srun --pty -c</code>''n''<code> bash</code>
 +
* The assignment of cores can be non-trivial (cf. also [[Slurm/Task-Affinity|task affinity]]), some rules:
 +
** gromacs: '''Don't''' use its <code>-pin</code> options.
 +
 
 +
= Scientific Software =
 +
 
 +
... installed (on the ''compute nodes'')
 +
 
 +
== AMBER ==
 +
 
 +
The [https://modules.readthedocs.io/en/latest module system] is not involved. Instead, scripts provided by the software set the environment.
 +
 
 +
* <code>/usr/local/amber18</code>
 +
* <code>/usr/local/amber20</code> (provides <code>parmed</code> as well)
 +
 
 +
Script to source therein (assuming [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash]): <code>amber.sh</code>
 +
 
 +
== GROMACS ==
 +
 
 +
The [https://modules.readthedocs.io/en/latest module system] is not involved. Instead, scripts provided by the software set the environment.
 +
 
 +
Versions (not all tested):
 +
 
 +
* <code>/usr/local/gromacs-2018.3</code>
 +
* <code>/usr/local/gromacs-2020.4</code>
 +
* <code>/usr/local/gromacs-3.3.4</code>
 +
* <code>/usr/local/gromacs-4.6.4</code>
 +
* <code>/usr/local/gromacs-5.0.1</code>
 +
* <code>/usr/local/gromacs-5.1.1</code>
 +
 
 +
Script to source therein (assuming [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash]): <code>bin/GMXRC.bash</code>
 +
 
 +
Ana provided an [https://wiki.uni-due.de/vilaverde/index.php/File:Gromacs_cpu.sh example script] to be submitted via <code>sbatch</code>.
 +
 
 +
== OpenMolcas ==
 +
 
 +
(compiled with Intel compiler and MKL)
 +
 
 +
Minimal example script to be <code>sbatch</code>ed:
 +
 
 +
  #!/bin/bash
 +
 
 +
  export MOLCAS=/usr/local/openmolcas
 +
  export MOLCAS_WORKDIR=/tmp/$USER-$SLURM_JOB_NAME-$SLURM_JOB_ID
 +
  mkdir $MOLCAS_WORKDIR
 +
  export PATH=$PATH:$MOLCAS
 +
  export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/mkl/latest/lib/intel64
 +
  export OMP_NUM_THREADS=${SLURM_NTASKS:-1}
 +
 
 +
  pymolcas the_input.inp
 +
 
 +
  rm -rf $MOLCAS_WORKDIR
 +
 
 +
If you want/need to use the module system instead of setting <code>LD_LIBRARY_PATH</code> manually:
 +
 
 +
  shopt -s expand_aliases
 +
  source /etc/profile.d/modules.sh
 +
 
 +
  module use /opt/intel/oneapi/modulefiles
 +
  module -s load compiler/latest
 +
  module -s load mkl/latest
  
 
= Intel Compiler & Co. =
 
= Intel Compiler & Co. =
Zeile 9: Zeile 83:
 
* is located in <code>/opt/intel/oneapi</code>
 
* is located in <code>/opt/intel/oneapi</code>
 
* must be made available via <code>module use /opt/intel/oneapi/modulefiles</code> (unless you include <code>/opt/intel/oneapi/modulefiles</code> in your <code>MODULEPATH</code>), then <code>module avail</code> lists the available modules.
 
* must be made available via <code>module use /opt/intel/oneapi/modulefiles</code> (unless you include <code>/opt/intel/oneapi/modulefiles</code> in your <code>MODULEPATH</code>), then <code>module avail</code> lists the available modules.
 +
* Module ''mkl/latest'' contains also FFT routines.
 +
 +
= Backups =
 +
 +
A backup of the users' home directories is taken nightly. To access the
 +
backups, first log in to the cluster. Then:
 +
* Users in <code>/home/stor.vd1</code>: Last night's backup is in <code>/export/vd1/$USER</code>.
 +
* Users in <code>/home/stor1.lv0</code>: You actually have seven backups corresponding to the last 7 days in  <code>/exports/lv0/snapshots/days.''D''/stor1/home/stor1.lv0/$USER</code> with ''D'' \(\in\{0,\dots,6\}\).

Version vom 5. Juli 2022, 16:03 Uhr

Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090) and 2×251TiB disk storage, purchased by Ana Vila Verde and Christopher Stein

Login

External Hostname is a-cluster.physik.uni-due.de (134.91.59.16), internal hostname is stor2.

Queueing system: Slurm

  • There are two queues (partitions in Slurm terminology) named:
    • CPUs, the default
    • GPUs, to be selected via -p GPUs for jobs which involve a GPU
  • In the CPUs queue, 2 cores stay reserved on each node for GPU jobs, resulting in 30 available cores.
  • sinfo displays the cluster's total load.
  • squeue shows running jobs. You can modify its output via the option -o. To make that permanent put something like alias squeue='squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %C %o"' into your .bashrc.
  • In the most simple cases, jobs are submitted via sbatch -n n script-name. The number n of CPUs is available within the script as $SLURM_NTASKS. It's not necessary to pass it on to mpirun, since the latter evaluates it on its own, anyway.
  • To allocate GPUs as well, add -G n or --gpus=n with n ∈ {1,2}. You can specify the type as well by prepending rtx2080: or rtx3090: to n.
  • Don't use background jobs (&), unless you wait for them before the end of the script.
  • srun is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its -n doesn't only reserve n cores but starts n jobs. (Those shouldn't contain mpirun, otherwise you'd end up with n² busy cores.)
  • For an interactive shell with n reserved cores on a compute node: srun --pty -cn bash
  • The assignment of cores can be non-trivial (cf. also task affinity), some rules:
    • gromacs: Don't use its -pin options.

Scientific Software

... installed (on the compute nodes)

AMBER

The module system is not involved. Instead, scripts provided by the software set the environment.

  • /usr/local/amber18
  • /usr/local/amber20 (provides parmed as well)

Script to source therein (assuming bash): amber.sh

GROMACS

The module system is not involved. Instead, scripts provided by the software set the environment.

Versions (not all tested):

  • /usr/local/gromacs-2018.3
  • /usr/local/gromacs-2020.4
  • /usr/local/gromacs-3.3.4
  • /usr/local/gromacs-4.6.4
  • /usr/local/gromacs-5.0.1
  • /usr/local/gromacs-5.1.1

Script to source therein (assuming bash): bin/GMXRC.bash

Ana provided an example script to be submitted via sbatch.

OpenMolcas

(compiled with Intel compiler and MKL)

Minimal example script to be sbatched:

 #!/bin/bash
 
 export MOLCAS=/usr/local/openmolcas
 export MOLCAS_WORKDIR=/tmp/$USER-$SLURM_JOB_NAME-$SLURM_JOB_ID
 mkdir $MOLCAS_WORKDIR
 export PATH=$PATH:$MOLCAS
 export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/mkl/latest/lib/intel64
 export OMP_NUM_THREADS=${SLURM_NTASKS:-1}
 
 pymolcas the_input.inp
 
 rm -rf $MOLCAS_WORKDIR

If you want/need to use the module system instead of setting LD_LIBRARY_PATH manually:

 shopt -s expand_aliases
 source /etc/profile.d/modules.sh
 
 module use /opt/intel/oneapi/modulefiles
 module -s load compiler/latest
 module -s load mkl/latest

Intel Compiler & Co.

  • is located in /opt/intel/oneapi
  • must be made available via module use /opt/intel/oneapi/modulefiles (unless you include /opt/intel/oneapi/modulefiles in your MODULEPATH), then module avail lists the available modules.
  • Module mkl/latest contains also FFT routines.

Backups

A backup of the users' home directories is taken nightly. To access the backups, first log in to the cluster. Then:

  • Users in /home/stor.vd1: Last night's backup is in /export/vd1/$USER.
  • Users in /home/stor1.lv0: You actually have seven backups corresponding to the last 7 days in /exports/lv0/snapshots/days.D/stor1/home/stor1.lv0/$USER with D \(\in\{0,\dots,6\}\).