Fast way to sort bam file by queryname similar to picard SortSam SORT_ORDER=queryname?
1
0
Entering edit mode
22 months ago
kalavattam ▴ 190

When sorting by queryname with Samtools (samtools sort -n), Samtools does a natural sort by colon-delimited subfield. On the other, when sorty by queryname with Picard (picard SortSam SORT_ORDER=queryname), Picard does not sort by colon-delimited subfield, instead treating the queryname as one field and then sorting in ASCII sort order (for example, as described in this comment and its sub-comments).

I would like to sort my bam files in the picard SortSam SORT_ORDER=queryname manner, but Picard SortSam is quite a bit slower than samtools sort -n; samtools sort -n can be parallelized while picard SamSort SORT_ORDER=queryname can't be parallelized. Is there a fast alternative to picard SamSort SORT_ORDER=queryname for this task?

picard samtools bam • 844 views
ADD COMMENT
1
Entering edit mode
22 months ago

I don't think there a software doing this "fast". You could fork samtools and change the function that compare the name of the reads here:

https://github.com/samtools/samtools/blob/develop/bam_sort.c#L1796

    if (g_is_by_qname) {
        int t = strnum_cmp(bam_get_qname(a.bam_record), bam_get_qname(b.bam_record));
        if (t != 0) return t;
        return (int) (a.bam_record->core.flag&0xc0) - (int) (b.bam_record->core.flag&0xc0);

strnum_cmp is implemented here https://github.com/samtools/samtools/blob/401e254877f3d57660fb848e27c23f4439297da8/bam_sort.c#L107

ADD COMMENT

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6