I am currently trying to run the pathfinding step in the PHG pipeline (v1.2) by using the -ImputePipelinePlugin -imputeTarget pathToVCF options, but might be running into memory issues.
I used the command in this way:
singularity exec -B /netscratch:/netscratch phg_1_2.simg /tassel-5-standalone/run_pipeline.pl -Xmx150G -debug -configParameters /PHG/pathfinding_config.txt -ImputePipelinePlugin -imputeTarget pathToVCF -endPlugin
I might missunderstand this, but should the -Xmx option not limit the amount of memory the job can be using? After about a day of running my job now seems to use 320GB of memory and I am worried this might increase even more and potentially reach the memory limit of the machine I'm using or cause other peoples jobs running on it to crash.
Is there a way to estimate roughly how much memory this job will use in the end?
E.g. by looking at the size of the pangenome fasta (78GB). I am currently running imputation for one sample (paired end reads, 2 gzipped fastq files à 30GB, ~430million reads). In the config file I chose
numThreads=70 (but did not include any xmx parameter there).
Does someone have prior experience with this?
Many thanks in advance!!
Based on the program name
run_pipeline.plthis appears to be a perl script. Option you are referring to is for
Java. Is that perl script calling some java code? Otherwise including that option does nothing for the perl code unless it is required for
singularity(not a user myself).
Does the amount of memory keep increasing, or does it start high and remain stable? Do you have a log file you can post? That may give us information on what is allocated by/for Singularity vs what is allocated for the PHG java code.
When it started it relatively quickly went up to 200GB, then steadily increased to now 327GB. So yes, it still seems to be increasing.
Do you mean the console output? It has already started minimap2, so that might be what is using so much memory?
(Apologies, the output is very long, but it just continues increasing the number of Processed alignments. Up to 1946000000 so far.)
Can you trim some of the log output on GitHub gist? We get the idea of what is happening.
-t 126(126 threads) so that may also have something to do with memory usage.
Done. Many apologies!
And thank you for checking.
Yes, there seems to be something up with that