A-Cluster: Unterschied zwischen den Versionen

Aus IT Physics
Zur Navigation springen Zur Suche springen
(Slurm)
Zeile 7: Zeile 7:
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] =
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] =


* Currently, there's just one partiton: ''a-cluster''
* Currently, there's just one ''partiton'': "a-cluster"
* In the most simple cases, jobs are submitted via <code>sbatch -n</code> ''n'' ''script-name''.
* In the most simple cases, jobs are submitted via <code>sbatch -n</code> ''n'' ''script-name''.
* <code>srun</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.)
* <code>srun</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.)
* Assigning cores to jobs can be non-trivial: [[Slurm/Task-Affinity|task affinity]]


= Intel Compiler & Co. =
= Intel Compiler & Co. =

Version vom 17. September 2021, 11:11 Uhr

Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090), purchased by Ana Vila Verde and Christopher Stein

Login

External address is 134.91.59.31 (will change soon and then get a hostname), internal hostname is stor2.

Queueing system: Slurm

  • Currently, there's just one partiton: "a-cluster"
  • In the most simple cases, jobs are submitted via sbatch -n n script-name.
  • srun is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its -n doesn't only reserve n cores but starts n jobs. (Those shouldn't contain mpirun, otherwise you'd end up with n² busy cores.)
  • Assigning cores to jobs can be non-trivial: task affinity

Intel Compiler & Co.

  • is located in /opt/intel/oneapi
  • must be made available via module use /opt/intel/oneapi/modulefiles (unless you include /opt/intel/oneapi/modulefiles in your MODULEPATH), then module avail lists the available modules.