I'm trying to streamline my analyses using bwa and samtools on different servers but I am only allowed to run big processes for 24 hours before it stops processes. The way around this is to have scripts/programs pick up from where it left off (like rsync can do). Is there a way to make bwa and samtools restart calculations without having to completely redo a whole session?
That is such a strange policy! I've heard of prohibiting long-running processes on login nodes, but clusters are meant to handle jobs that take a long time to get to completion. Maybe try talking to your sysadmin to check why processes are being restricted to 24h? Maybe they need special approval for longer processes.
same problem here :-) (see dariober's solution )
Sadly many (?) NGS programs are not check-point capable (some exceptions, e.g. most 10x genomics software). So if a job is interrupted you will have to start over.
If you share your code we can see if there are ways to optimize it in order to increase the speed. 24 hours is pretty long for alignment unless you are doing really large WGS or Hi-C datasets.
BWA for one species generally isn't that long, it's that my script will automatically run it for 30+ species. The samtools sort however can take a freakin long time and I have no idea how to accommodate that.
You can use multiple cores for samtools sort as well. You have not given us any information about how you are running these jobs.