Restarting belt automatically: Difference between revisions

From Arbeitsgruppe Kuiper
Jump to navigation Jump to search
(Created page with "Add the following script to your submit script to automatically restart belt from the last output file. latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1) timeout 95h mpirun -np 864 ./belt -restart $latest if $? == 124 ; then sbatch submit fi In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95...")
 
No edit summary
 
Line 1: Line 1:
Add the following script to your submit script to automatically restart belt from the last output file.
Add the following script to your submit script to automatically restart belt from the last output file.


<code>
latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)
latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)


timeout 95h mpirun -np 864 ./belt -restart $latest
timeout 95h mpirun -np 864 ./belt -restart $latest
if [[ $? == 124 ]]; then  
 
  sbatch submit
if [[ $? == 124 ]]; then
sbatch submit
fi
fi
</code>


In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.
In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.

Latest revision as of 10:51, 3 July 2024

Add the following script to your submit script to automatically restart belt from the last output file.

latest=$(ls -l data | grep -o "rho\.[0-9]*\.dbl" | grep -oE "[0-9]*" | sort -g -r | head -n 1)

timeout 95h mpirun -np 864 ./belt -restart $latest

if $? == 124 ; then

sbatch submit

fi

In the above example, the code will be terminated after 95 hours and the submit file called "submit" will be resubmitted. Note that you need to have SBATCH time of at least 95 hours for this script to work. Otherwise, the SBATCH system will terminate your code with the different exit code (probably), and it will not be restarted. So, #SBATCH -t 4-00:00:00 option should be used.