Restarting belt automatically
Jump to navigation
Jump to search
Add the following script to your submit script to automatically restart belt from the last output file.
latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)
timeout 95h mpirun -np 864 ./belt -restart $latest
if $? == 124 ; then
sbatch submit
fi
In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.