Restarting belt automatically

From Arbeitsgruppe Kuiper
Jump to navigation Jump to search

Add the following script to your submit script to automatically restart belt from the last output file.

latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)

timeout 95h mpirun -np 864 ./belt -restart $latest

if $? == 124 ; then

sbatch submit

fi

In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.