Hi guys,
I'm trying to use GATK3 (since v4 doesn't have the DepthOfCoverage tool yet) to calculate the depth of coverage across some WGS BAM files, with and with out soft clipping.
Currently, I'm using the following the following
#calculate depth, exclude soft-clipped bases
java -Xmx${mem}g -jar $jar -mbq 0 -mmq 0 -L ${bed} -o $out/${bamName}.doc -I $bam -T DepthOfCoverage -R $ref
#calculate depth, include soft-clipped bases
java -Xmx${mem}g -jar $jar -T ClipReads -l INFO -I $bam -o $bam.noclip.bam -R $ref -CR REVERT_SOFTCLIPPED_BASES -os clip.stats 2>> err.sc >> out.sc
java -Xmx${mem}g -jar $jar -mbq 0 -mmq 0 -L $bed -o $out/${bamName}.noclip.bam.doc -I $bam.noclip.bam -T DepthOfCoverage -R $ref
Basically, what I'm asking GATK to do is: 1) traverse the original BAM file and calculate the depth; 2) convert this original BAM file into a new BAM file with no soft clipping; 3) traverse this new BAM file and calculate the depth.
This is a computationally intensive process; it takes both large amounts of time and memory. And I'm wondering if there's a more efficient way to do it? If there's another tool, like samtools depth
, that could accomplish this more efficiently I'm happy to switch to that or anything else.
Thanks!
I understand that I may have to use GATK
-T ClipReads -CR REVERT_SOFTCLIPPED_BASES
to create a BAM with no soft-clipping from my original BAM, but won't some ways of calculating depth across this original and modified BAM be more efficient than others?Alternatively, is there no tool that can, processing a BAM just once, tell me how what the depth w and w/o soft-clipping is at each spot? Or just the depth and the amount of soft-clipping and I'll add these numbers together? Thanks for your help.
it sounds like microoptimization and you'll consume some time and storage to run GATK
what would be the logic in this ? if a part of the read is mapped on another region of the genome, why would you need to consider this ?