Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?
1
0
Entering edit mode
7 weeks ago
kalavattam ▴ 80

When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield, instead treating the queryname as one field and then sorting in ASCII sort order (for example, as described in this comment and its sub-comments).

I would like to sort my bam files in the picard SortSam SORT_ORDER=queryname manner, but Picard SortSam is quite a bit slower than samtools sort -n; samtools sort -n can be parallelized while picard SamSort SORT_ORDER=queryname can't be parallelized. Is there a fast alternative to picard SamSort SORT_ORDER=queryname for this task?

bam picard sort samtools • 351 views
ADD COMMENT
1
Entering edit mode
7 weeks ago

I don't think there a software doing this "fast". You could fork samtools and change the function that compare the name of the reads here:

https://github.com/samtools/samtools/blob/develop/bam_sort.c#L1796

    if (g_is_by_qname) {
        int t = strnum_cmp(bam_get_qname(a.bam_record), bam_get_qname(b.bam_record));
        if (t != 0) return t;
        return (int) (a.bam_record->core.flag&0xc0) - (int) (b.bam_record->core.flag&0xc0);

strnum_cmp is implemented here https://github.com/samtools/samtools/blob/401e254877f3d57660fb848e27c23f4439297da8/bam_sort.c#L107

ADD COMMENT

Login before adding your answer.

Traffic: 1042 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6