This may be real simple, but I'm having memory issues with running tr2aacds.pl evidential gene pipeline. I am running through a PBS queue set-up on the server. My node has 256GB RAM and 64CPU. I keep getting an error saying PBS killed my job because memory allocation was exhausted....Script below I am running.
As you can see from the above there is a MAXMEM (flag is in MB) setting and also an NCPU setting. For these I have set low memory allocations as you see above (as a buffer so that the job isn't killed again), as I was previously trying to run all 250GB and 60 cores of my node. Despite minimising the memory and cores like the above script, still it chews up all the memory and dies. It seems to fail during the Blast stage of the pipeline. Any ideas where I'm going wrong, or what's happening?
Evigene uses NCBI blastn with ncpu processes, for an all-cds x all-cds blastn run. That can eat up more memory than you think, if your transcript set is large (many millions, which is the recommended way), and has many near-identical coding sequences. The answer to this problem is reduce ncpu, until it runs to finish. This tr2aacds is efficient in its data reduction and will generally finish large, 10M sets of transcripts in a few hours with 12 to 24 cores, on machines with 64GB to 128 GB memory. The more recent Evigene tr2aacds version now checks for failed blastn parts, and reruns them (if not too many failed).
evigene16mar20.tar is most recent public version.
This part of Evigene also uses cd-hit-est prior blastn, which respects that MAXMEM parameter. However, the blastn portion doesn't have a memory setting, only the -ncpu setting to tune it to your system, so divide available memory by ncpu to estimate amount per blastn process, where needs will increase with size of your transcript set and near identity of transcripts.
If you have 250G available why are you setting ~5G limit in your actual command? If all those cores try to start blast jobs in parallel then no wonder you are running out of RAM. You may want to try a more conservative limit, say 20 cores.