Restarting belt automatically: Difference between revisions
Jump to navigation
Jump to search
(Created page with "Add the following script to your submit script to automatically restart belt from the last output file. latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1) timeout 95h mpirun -np 864 ./belt -restart $latest if $? == 124 ; then sbatch submit fi In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95...") |
No edit summary |
||
| Line 1: | Line 1: | ||
Add the following script to your submit script to automatically restart belt from the last output file. | Add the following script to your submit script to automatically restart belt from the last output file. | ||
<code> | |||
latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1) | latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1) | ||
timeout 95h mpirun -np 864 ./belt -restart $latest | timeout 95h mpirun -np 864 ./belt -restart $latest | ||
if [[ $? == 124 ]]; then | |||
if [[ $? == 124 ]]; then | |||
sbatch submit | |||
fi | fi | ||
</code> | |||
In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used. | In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used. | ||
Latest revision as of 10:51, 3 July 2024
Add the following script to your submit script to automatically restart belt from the last output file.
latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)
timeout 95h mpirun -np 864 ./belt -restart $latest
if $? == 124 ; then
sbatch submit
fi
In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.