A-Cluster: Unterschied zwischen den Versionen
(→AMBER: +parmed) |
K (→Queueing system: Slurm: limit of 30 cores only on CPUs queue) |
||
| (18 dazwischenliegende Versionen desselben Benutzers werden nicht angezeigt) | |||
| Zeile 1: | Zeile 1: | ||
Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090), purchased by Ana Vila Verde and Christopher Stein | Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090) and 2×251TiB disk storage, purchased by Ana Vila Verde and Christopher Stein | ||
= Login = | = Login = | ||
External | External Hostname is <code>a-cluster.physik.uni-due.de</code> (134.91.59.16), internal hostname is <code>stor2</code>. | ||
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] = | = Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] = | ||
* <code>sinfo</code> displays the cluster's total load. | * There are two queues (''partitions'' in Slurm terminology) named: | ||
* <code>squeue</code> shows running jobs. | ** ''CPUs'', the default | ||
** ''GPUs'', to be selected via <code>-p GPUs</code> for jobs which involve a GPU | |||
* In the most simple cases, jobs are submitted via <code>sbatch -n</code> ''n'' ''script-name''. The number ''n'' of CPUs is available within the script as <code>$SLURM_NTASKS</code>. It's not necessary to pass it on to <code>mpirun</code>, since the latter evaluates it on its own, anyway. | * In the ''CPUs'' queue, 2 cores stay reserved on each node for GPU jobs, resulting in 30 available cores. | ||
* <code>srun</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.) | * <code>[https://slurm.schedmd.com/sinfo.html sinfo]</code> displays the cluster's total load. | ||
* <code>[https://slurm.schedmd.com/squeue.html squeue]</code> shows running jobs. You can modify its output via the option <code>-o</code>. To make that permanent put something like <code>alias squeue='squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %C %o"'</code> into your <code>.bashrc</code>. | |||
* In the most simple cases, jobs are submitted via <code>[https://slurm.schedmd.com/sbatch.html sbatch] -n</code> ''n'' ''script-name''. The number ''n'' of CPUs is available within the script as <code>$SLURM_NTASKS</code>. It's not necessary to pass it on to <code>mpirun</code>, since the latter evaluates it on its own, anyway. | |||
* To allocate GPUs as well, add <code>-G </code>''n'' or <code>--gpus=</code>''n'' with ''n'' ∈ {1,2}. You can specify the type as well by prepending <code>rtx2080:</code> or <code>rtx3090:</code> to ''n''. | |||
* Don't use background jobs (<code>&</code>), unless you <code>wait</code> for them before the end of the script. | |||
* <code>[https://slurm.schedmd.com/srun.html srun]</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.) | |||
* For an interactive shell with ''n'' reserved cores on a compute node: <code>srun --pty -c</code>''n''<code> bash</code> | |||
* The assignment of cores can be non-trivial (cf. also [[Slurm/Task-Affinity|task affinity]]), some rules: | * The assignment of cores can be non-trivial (cf. also [[Slurm/Task-Affinity|task affinity]]), some rules: | ||
** gromacs: '''Don't''' use its <code>-pin</code> options. | ** gromacs: '''Don't''' use its <code>-pin</code> options. | ||
= | = Scientific Software = | ||
... installed (on the ''compute nodes'') | ... installed (on the ''compute nodes'') | ||
== AMBER == | |||
The [https://modules.readthedocs.io/en/latest module system] is not involved. Instead, scripts provided by the software set the environment. | |||
* <code>/usr/local/amber18</code> | * <code>/usr/local/amber18</code> | ||
* <code>/usr/local/amber20</code> (provides <code>parmed</code> as well) | * <code>/usr/local/amber20</code> (provides <code>parmed</code> as well) | ||
Script to source therein (assuming [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash]): <code>amber.sh</code> | |||
== GROMACS == | == GROMACS == | ||
(not all tested) | The [https://modules.readthedocs.io/en/latest module system] is not involved. Instead, scripts provided by the software set the environment. | ||
Versions (not all tested): | |||
* <code>/usr/local/gromacs-2018.3</code> | * <code>/usr/local/gromacs-2018.3</code> | ||
| Zeile 38: | Zeile 46: | ||
* <code>/usr/local/gromacs-5.0.1</code> | * <code>/usr/local/gromacs-5.0.1</code> | ||
* <code>/usr/local/gromacs-5.1.1</code> | * <code>/usr/local/gromacs-5.1.1</code> | ||
Script to source therein (assuming [https://en.wikipedia.org/wiki/Bash_(Unix_shell) bash]): <code>bin/GMXRC.bash</code> | |||
Ana provided an [https://wiki.uni-due.de/vilaverde/index.php/File:Gromacs_cpu.sh example script] to be submitted via <code>sbatch</code>. | |||
== OpenMolcas == | |||
(compiled with Intel compiler and MKL) | |||
Minimal example script to be <code>sbatch</code>ed: | |||
#!/bin/bash | |||
export MOLCAS=/usr/local/openmolcas | |||
export MOLCAS_WORKDIR=/tmp/$USER-$SLURM_JOB_NAME-$SLURM_JOB_ID | |||
mkdir $MOLCAS_WORKDIR | |||
export PATH=$PATH:$MOLCAS | |||
export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/mkl/latest/lib/intel64 | |||
export OMP_NUM_THREADS=${SLURM_NTASKS:-1} | |||
pymolcas the_input.inp | |||
rm -rf $MOLCAS_WORKDIR | |||
If you want/need to use the module system instead of setting <code>LD_LIBRARY_PATH</code> manually: | |||
shopt -s expand_aliases | |||
source /etc/profile.d/modules.sh | |||
module use /opt/intel/oneapi/modulefiles | |||
module -s load compiler/latest | |||
module -s load mkl/latest | |||
= Intel Compiler & Co. = | = Intel Compiler & Co. = | ||
| Zeile 43: | Zeile 83: | ||
* is located in <code>/opt/intel/oneapi</code> | * is located in <code>/opt/intel/oneapi</code> | ||
* must be made available via <code>module use /opt/intel/oneapi/modulefiles</code> (unless you include <code>/opt/intel/oneapi/modulefiles</code> in your <code>MODULEPATH</code>), then <code>module avail</code> lists the available modules. | * must be made available via <code>module use /opt/intel/oneapi/modulefiles</code> (unless you include <code>/opt/intel/oneapi/modulefiles</code> in your <code>MODULEPATH</code>), then <code>module avail</code> lists the available modules. | ||
* Module ''mkl/latest'' contains also FFT routines. | |||
= Backups = | |||
A backup of the users' home directories is taken nightly. To access the | |||
backups, first log in to the cluster. Then: | |||
* Users in <code>/home/stor.vd1</code>: Last night's backup is in <code>/export/vd1/$USER</code>. | |||
* Users in <code>/home/stor1.lv0</code>: You actually have seven backups corresponding to the last 7 days in <code>/exports/lv0/snapshots/days.''D''/stor1/home/stor1.lv0/$USER</code> with ''D'' \(\in\{0,\dots,6\}\). | |||
Version vom 5. Juli 2022, 15:03 Uhr
Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090) and 2×251TiB disk storage, purchased by Ana Vila Verde and Christopher Stein
Login
External Hostname is a-cluster.physik.uni-due.de (134.91.59.16), internal hostname is stor2.
Queueing system: Slurm
- There are two queues (partitions in Slurm terminology) named:
- CPUs, the default
- GPUs, to be selected via
-p GPUsfor jobs which involve a GPU
- In the CPUs queue, 2 cores stay reserved on each node for GPU jobs, resulting in 30 available cores.
sinfodisplays the cluster's total load.squeueshows running jobs. You can modify its output via the option-o. To make that permanent put something likealias squeue='squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %C %o"'into your.bashrc.- In the most simple cases, jobs are submitted via
sbatch -nn script-name. The number n of CPUs is available within the script as$SLURM_NTASKS. It's not necessary to pass it on tompirun, since the latter evaluates it on its own, anyway. - To allocate GPUs as well, add
-Gn or--gpus=n with n ∈ {1,2}. You can specify the type as well by prependingrtx2080:orrtx3090:to n. - Don't use background jobs (
&), unless youwaitfor them before the end of the script. srunis intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its-ndoesn't only reserve n cores but starts n jobs. (Those shouldn't containmpirun, otherwise you'd end up with n² busy cores.)- For an interactive shell with n reserved cores on a compute node:
srun --pty -cnbash - The assignment of cores can be non-trivial (cf. also task affinity), some rules:
- gromacs: Don't use its
-pinoptions.
- gromacs: Don't use its
Scientific Software
... installed (on the compute nodes)
AMBER
The module system is not involved. Instead, scripts provided by the software set the environment.
/usr/local/amber18/usr/local/amber20(providesparmedas well)
Script to source therein (assuming bash): amber.sh
GROMACS
The module system is not involved. Instead, scripts provided by the software set the environment.
Versions (not all tested):
/usr/local/gromacs-2018.3/usr/local/gromacs-2020.4/usr/local/gromacs-3.3.4/usr/local/gromacs-4.6.4/usr/local/gromacs-5.0.1/usr/local/gromacs-5.1.1
Script to source therein (assuming bash): bin/GMXRC.bash
Ana provided an example script to be submitted via sbatch.
OpenMolcas
(compiled with Intel compiler and MKL)
Minimal example script to be sbatched:
#!/bin/bash
export MOLCAS=/usr/local/openmolcas
export MOLCAS_WORKDIR=/tmp/$USER-$SLURM_JOB_NAME-$SLURM_JOB_ID
mkdir $MOLCAS_WORKDIR
export PATH=$PATH:$MOLCAS
export LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/mkl/latest/lib/intel64
export OMP_NUM_THREADS=${SLURM_NTASKS:-1}
pymolcas the_input.inp
rm -rf $MOLCAS_WORKDIR
If you want/need to use the module system instead of setting LD_LIBRARY_PATH manually:
shopt -s expand_aliases source /etc/profile.d/modules.sh module use /opt/intel/oneapi/modulefiles module -s load compiler/latest module -s load mkl/latest
Intel Compiler & Co.
- is located in
/opt/intel/oneapi - must be made available via
module use /opt/intel/oneapi/modulefiles(unless you include/opt/intel/oneapi/modulefilesin yourMODULEPATH), thenmodule availlists the available modules. - Module mkl/latest contains also FFT routines.
Backups
A backup of the users' home directories is taken nightly. To access the backups, first log in to the cluster. Then:
- Users in
/home/stor.vd1: Last night's backup is in/export/vd1/$USER. - Users in
/home/stor1.lv0: You actually have seven backups corresponding to the last 7 days in/exports/lv0/snapshots/days.D/stor1/home/stor1.lv0/$USERwith D \(\in\{0,\dots,6\}\).