A-Cluster: Unterschied zwischen den Versionen

Aus IT Physics
Wechseln zu: Navigation, Suche
(Queueing system: Slurm: task affinity)
(Queueing system: Slurm: slurm commands, gromacs)
Zeile 7: Zeile 7:
 
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] =
 
= Queueing system: [https://slurm.schedmd.com/documentation.html Slurm] =
  
 +
* <code>sinfo</code> displays the cluster's total load.
 +
* <code>squeue</code> shows running jobs.
 
* Currently, there's just one ''partiton'': "a-cluster"
 
* Currently, there's just one ''partiton'': "a-cluster"
 
* In the most simple cases, jobs are submitted via <code>sbatch -n</code> ''n'' ''script-name''.
 
* In the most simple cases, jobs are submitted via <code>sbatch -n</code> ''n'' ''script-name''.
 
* <code>srun</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.)
 
* <code>srun</code> is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its <code>-n</code> doesn't only reserve ''n'' cores but starts ''n'' jobs. (Those shouldn't contain <code>mpirun</code>, otherwise you'd end up with ''n''² busy cores.)
* Assigning cores to jobs can be non-trivial: [[Slurm/Task-Affinity|task affinity]]
+
* The assignment of cores can be non-trivial (cf. also [[Slurm/Task-Affinity|task affinity]]), some rules:
 +
** gromacs: '''Don't''' use its <code>-pin</code> options.
  
 
= Intel Compiler & Co. =
 
= Intel Compiler & Co. =

Version vom 20. September 2021, 22:55 Uhr

Linux cluster with currently 13 compute nodes (CPUs: 416 cores, GPUs: 8x RTX 2080 + 18x RTX 3090), purchased by Ana Vila Verde and Christopher Stein

Login

External address is 134.91.59.31 (will change soon and then get a hostname), internal hostname is stor2.

Queueing system: Slurm

  • sinfo displays the cluster's total load.
  • squeue shows running jobs.
  • Currently, there's just one partiton: "a-cluster"
  • In the most simple cases, jobs are submitted via sbatch -n n script-name.
  • srun is intended for interactive jobs (stdin+stdout+stderr stay attached to the terminal) and its -n doesn't only reserve n cores but starts n jobs. (Those shouldn't contain mpirun, otherwise you'd end up with n² busy cores.)
  • The assignment of cores can be non-trivial (cf. also task affinity), some rules:
    • gromacs: Don't use its -pin options.

Intel Compiler & Co.

  • is located in /opt/intel/oneapi
  • must be made available via module use /opt/intel/oneapi/modulefiles (unless you include /opt/intel/oneapi/modulefiles in your MODULEPATH), then module avail lists the available modules.