Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?
Entering edit mode
7 weeks ago
kalavattam ▴ 80

When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield, instead treating the queryname as one field and then sorting in ASCII sort order (for example, as described in this comment and its sub-comments).

I would like to sort my bam files in the picard SortSam SORT_ORDER=queryname manner, but Picard SortSam is quite a bit slower than samtools sort -n; samtools sort -n can be parallelized while picard SamSort SORT_ORDER=queryname can't be parallelized. Is there a fast alternative to picard SamSort SORT_ORDER=queryname for this task?

bam picard sort samtools • 351 views
Entering edit mode
7 weeks ago

I don't think there a software doing this "fast". You could fork samtools and change the function that compare the name of the reads here:

    if (g_is_by_qname) {
        int t = strnum_cmp(bam_get_qname(a.bam_record), bam_get_qname(b.bam_record));
        if (t != 0) return t;
        return (int) (a.bam_record->core.flag&0xc0) - (int) (b.bam_record->core.flag&0xc0);

strnum_cmp is implemented here


Login before adding your answer.

Traffic: 1042 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6