Does bamtools split preserve sort and rmdup?
4.8 years ago
ccnn

Taking over someone else's code, I've discovered a bottleneck. This is a portion of a script that accepts a BAM and outputs smaller, sorted, rmdup-ed, indexed BAMs for each chromosome.

for chr in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M; do

samtools view -@ $threads -b$bam $chr >$bam.$chr.bam samtools index$bam.$chr.bam samtools rmdup$bam.$chr.bam$bam.$chr.rmdup.bam samtools index$bam.$chr.rmdup.bam done  After reading this Biostars question, I replaced the loop and the samtools view with bamtools split -in$bam -reference


Though it can't be multithreaded, this does seem—at least in my benchmarking so far—to be faster. I'm wondering, though, whether the BAMs it produces are guaranteed to be sorted and rmdup-ed if the input \$bam was? Or would I still need to run samtools rmdup (or samtools sort) on each child BAM?

bamtools samtools bam alignment genome
4.8 years ago

It's not going to change flags or order, so yes, you can assume that it's behaving as you desire (if it's changing order too much then samtools index will fail).