I used Mutect2 on human exome sequencing (30x) data before and without parallel processing, it took me 5.5 days. That was still doable - However, now I am trying to run Mutect2 on my human WGS data (30x, bqsr bam file size ~120G), the estimation tells me 500+ weeks at the start!
I feel like even I split on regions, or use Queue offered by GATK will not improve the situation to a satisfying level... Here is my code:
java -jar $GATK -T MuTect2 -R $REF -I:tumor $BQSRTBam -I:normal $BQSRNBam --dbsnp $DBSNP --cosmic $cosmicData -o $mutect2VCF
I tried different memory, and with/without dbsnp/cosmic... going without cosmic seems to let Mutect2 give a longer estimation.
Anyone experience similar issues? Any suggestion is appreciated. Thank you very much in advance.
I had a little experience with VarScan2 - don't you need to use samtools pileup? I recall the pileup procedure to be painfully long too... Maybe I did not do it correctly, but please comment on it. Thank you.
Yes, you are right, samtools mipileup is needed. But samtools mpileup + VarScan2 is still much much faster than MuTect2.
It is possible to pipe a few process to avoid creating the intermediate files.