Question: Does bamtools split preserve sort and rmdup?
0
gravatar for ccnn
17 months ago by
ccnn10
ccnn10 wrote:

Taking over someone else's code, I've discovered a bottleneck. This is a portion of a script that accepts a BAM and outputs smaller, sorted, rmdup-ed, indexed BAMs for each chromosome.

for chr in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M; do

    samtools view -@ $threads -b $bam $chr > $bam.$chr.bam
    samtools index $bam.$chr.bam
    samtools rmdup $bam.$chr.bam $bam.$chr.rmdup.bam 
    samtools index $bam.$chr.rmdup.bam

done

After reading this Biostars question, I replaced the loop and the samtools view with

bamtools split -in $bam -reference

Though it can't be multithreaded, this does seem—at least in my benchmarking so far—to be faster. I'm wondering, though, whether the BAMs it produces are guaranteed to be sorted and rmdup-ed if the input $bam was? Or would I still need to run samtools rmdup (or samtools sort) on each child BAM?

ADD COMMENTlink modified 17 months ago by Devon Ryan91k • written 17 months ago by ccnn10
2
gravatar for Devon Ryan
17 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

It's not going to change flags or order, so yes, you can assume that it's behaving as you desire (if it's changing order too much then samtools index will fail).

ADD COMMENTlink written 17 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1164 users visited in the last hour