Question: Is My Bam File Sorted ?
39
gravatar for Pierre Lindenbaum
8.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Hi all,

I wonder if there is any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not.

(longer story: I've merged a set of sorted bams using 'samtools merge' and I've then used MarkDuplicates with the option: 'assume_sorted=true'. Am I right ?)

Thanks,

Pierre

UPDATE: I wrote a post about it : http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using.html

sequencing next-gen bam sort • 47k views
ADD COMMENTlink modified 2.3 years ago by martin.triska110 • written 8.5 years ago by Pierre Lindenbaum121k
1

For those just looking for a quick way to do this using modern samtools, see martin.triska's answer further down.

ADD REPLYlink modified 17 months ago • written 17 months ago by wtwhite10

Hi Pierre, do you have similar solution to check if sam file is sorted or not? Thank you

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Prakki Rama2.3k

just pipe your SAM into 'samtools view' before using my program.

ADD REPLYlink written 5.7 years ago by Pierre Lindenbaum121k

got it. Thank you :)

ADD REPLYlink written 5.7 years ago by Prakki Rama2.3k

Hi Pierre, You didn't give us the explicit answer yet? Do we still need sort the bam after we 'samtools merge' sorted small bams? I made a small test and find the merged sorted bam are already sorted.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Shicheng Guo7.6k
29
gravatar for Brad Chapman
8.5 years ago by
Brad Chapman9.4k
Boston, MA
Brad Chapman9.4k wrote:

You can use the sort order (SO) flag in the header to check if the file has been sorted:

% samtools view -H 5_110118_FC62VT6AAXX-hg18-unsort.bam
@HD    VN:1.0    SO:unsorted
% samtools view -H 5_110118_FC62VT6AAXX-hg18-sort.bam
@HD    VN:1.0    SO:coordinate

Unfortunately samtools index will work on both types of files without raising a commandline error, but will produce a smaller index file which is indicative of a problem:

% samtools index 5_110118_FC62VT6AAXX-hg18-sort.bam 
% echo $? 
0
% samtools index 5_110118_FC62VT6AAXX-hg18-unsort.bam 
% echo $?
0
% ls -lh *.bai 
chapman users 6.2M 2011-02-01 06:46 5_110118_FC62VT6AAXX-hg18-sort.bam.bai
chapman users  408 2011-02-01 06:47 5_110118_FC62VT6AAXX-hg18-unsort.bam.bai

Edit: From the comments it appears as if samtools will sometimes raise an error in this case. Here is what an unsorted BAM file that finishes with a truncated index and no error messages looks like:

% samtools view 5_110118_FC62VT6AAXX-hg18-unsort.bam | head
HWI-EAS264:5:1:1353:936#0       77      *       0       0       *       *       0       0       NTTTCCATTCCATTATGGATGATTCCATTCCATTGTATTC        ########################################        XM:i:0
HWI-EAS264:5:1:1353:936#0       141     *       0       0       *       *       0       0       TGGAACGGAATGGAATGGAATGGAATGGAATCAATCCGAC        DDGGDEGGGGDG@BGGG=@G:EFFEEB4FF3BBB@B<EBA        XM:i:0
HWI-EAS264:5:1:1463:944#0       99      chr4    79057890        255     40M     =       79058357       507      NTAGATATGTAAATAAAGTACAGGAAAATGTGGTGAGTAT        ########################################       XA:i:1   MD:Z:0A39       NM:i:1
HWI-EAS264:5:1:1463:944#0       147     chr4    79058357        255     40M     =       79057890       -507     AAAATATATACCAGGTAGATAAAGGGTCGTTCGTATTTCA        @B2EEDEB8@ED@DE?,A:?:DGEGDD:GGA@=8@182@A       XA:i:0   MD:Z:40 NM:i:0
HWI-EAS264:5:1:1534:933#0       77      *       0       0       *       *       0       0       NACTCCATTCAATTCCATTCCTTTTAGCTCGGGTTGTTTC        ########################################        XM:i:0
HWI-EAS264:5:1:1534:933#0       141     *       0       0       *       *       0       0       AATGGAATGGAATGGAATAAATCCCGGGGGAGTGGAATGG        BD4CC=G@D@BB-A?BB-:B==DB,==*==)4&::A=>CB        XM:i:0
HWI-EAS264:5:1:1691:931#0       99      chr18   28072372        255     40M     =       28072771       439      NCTGGGTTTTTATGTCTTCCAGCTAGTAAGTGCTGGAGCT        #322244442<:22<@@C@C@@;@;646666446664536       XA:i:1   MD:Z:0G39       NM:i:1
HWI-EAS264:5:1:1691:931#0       147     chr18   28072771        255     40M     =       28072372       -439     GTTCTCCCGCCTGCACCTCTCTCTGTGTGTTGGCTTTGTT        >B@D>ADDBDD>BD-DDDEDGGFHIIFI@HIIIHHIIIIH       XA:i:0   MD:Z:40 NM:i:0
ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by Brad Chapman9.4k
3

STrange... we have sorted BAM files with @HD VN:1.0 SO:unsorted in the header... I need to investigate further it seems...

ADD REPLYlink written 8.5 years ago by Tim_Yates110
1

Thanks very much Heng, that did it. I now get the error '[bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates.' I appreciate you looking into this.

ADD REPLYlink written 8.5 years ago by Brad Chapman9.4k

The only problem is that changing that flag is program dependent and not required by the BAM spec. Picard does set the flag correctly, although samtools sort does not. I remember reading on SeqAnswers that Heng was planning on implementing this to avoid confusion; he can probably chime in if that's still a plan. Another tool to check for "bad" indexes on unsorted files is 'samtools idxstats' which might be nicer than looking at the size.

ADD REPLYlink written 8.5 years ago by Brad Chapman9.4k

Which version of samtools is indexing an unsorted file? I happen to have version 0.1.8 (r613) here and it stops with a message that the alignments are not sorted if I try to index a file sorted by queryname.

ADD REPLYlink written 8.5 years ago by iw9oel_ad6.0k

That's with the most recent version: 0.1.12a (r862). I would prefer to get an error.

ADD REPLYlink written 8.5 years ago by Brad Chapman9.4k

Same observation as keith I do get an error: [mdo041@astrakan ~]$ samtools-0.1.12a/samtools index test.fas.blat.sam.bam [bam_index_core] the alignment is not sorted (NG-1755): 2158255 > 2132773 in 1-th chr using a newly imported sam file. Brad: I think the reason for not seeing an error is that the file you are trying is actually sorted just by chance. So I don't know about the flags.

ADD REPLYlink written 8.5 years ago by Michael Dondrup46k

Try the version here: https://github.com/lh3/samtools

ADD REPLYlink written 8.5 years ago by lh331k

Same result for me: [bam_index_core] the alignment is not sorted

ADD REPLYlink written 8.5 years ago by Michael Dondrup46k

Thanks. I mainly ask Brad to try it on the unsorted file for which samtools does not give any error messages.

ADD REPLYlink written 8.5 years ago by lh331k

Michael and Keith, the file is definitely unsorted. Not sure what I'm doing different. Heng, I tried the GitHub version and got the same behavior. Edited the answer above with the first few lines of the file if that helps; it's from a paired end Bowtie alignment.

ADD REPLYlink written 8.5 years ago by Brad Chapman9.4k

Hmm... Please try again. Updated a minute ago.

ADD REPLYlink written 8.5 years ago by lh331k

I have realized that such input will cause problems, but it seems that the fix is not correct. Please try again. Updated a minute ago.

ADD REPLYlink written 8.5 years ago by lh331k
11
gravatar for martin.triska
2.3 years ago by
martin.triska110
martin.triska110 wrote:

If anyone arriving to this thread years later as me, you can run:

samtools stats <file.bam> | grep "is sorted:"

(I'm using the samtools 1.4, don't know when was this added to stats) Enjoy :)

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by martin.triska110

If it gives back "is sorted: 1" means it is, indeed, sorted? I want to corroborate because the Samtools manual does not say anything about the meaning of this output code (in the manual entry for the stats command; I mean, something like explaining that 1 is sorted and 0 is not).

ADD REPLYlink modified 12 weeks ago • written 4 months ago by msimmer92200

Samtools did check the order. In the source code:

if ( stats->tid==bam_line->core.tid && bam_line->core.pos<stats->pos )
stats->is_sorted = 0;

and

fprintf(to, "SN\tis sorted:\t%d\n", stats->is_sorted ? 1 : 0);

Link to source on github

ADD REPLYlink written 3 months ago by ColorfulBlack0
6
gravatar for Michael Dondrup
8.5 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Hi,

I think if you run samtools index, it will throw an error if the file was not sorted before, can use return code e.g. in a script. Otherwise it would do no harm to run sort again, would it?

Edit: So there seems to be some confusion here about flags and samtools index, but the safest method to make sure the file is really sorted is to always run samtools sorton the bam file. This changes nothing if the file is already sorted. Also, I have never seen samtools index work with a non-sorted file, I am refering to 0.1.12a, exept in the case that the imported sam file was already sorted from the beginning, thus this method is safe, imho.

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by Michael Dondrup46k
2
gravatar for iw9oel_ad
8.5 years ago by
iw9oel_ad6.0k
iw9oel_ad6.0k wrote:

Unless you have extra information about the BAM file, it is not possible to tell without scanning the file. Extra information might be that you know that the optional sort order field in the header is present and correct, you know the complete history of the file or you have the expected checksum of the sorted data.

Otherwise, you can only be sure of the sort order by tracking the reference sequence and coordinates of the previous and current alignments while processing the file (in the case of coordinate sorting), or by pre-scanning with a program that performs this check. This is what 'samtools index' does while indexing.

It may be too expensive to pre-scan a huge BAM file to check for sortedness, so the cheapest way is to do it while you work on the data.

ADD COMMENTlink modified 8.5 years ago • written 8.5 years ago by iw9oel_ad6.0k
1
gravatar for Tim_Yates
8.5 years ago by
Tim_Yates110
Tim_Yates110 wrote:

I ran into this problem recently, and no it looks like there is no way of telling.

The best I could come up with, was requiring sorted files to then be indexed to generate a FILE.bai file. Any BAM files that were missing this paired file were sorted and indexed as part of the pipeline...

ADD COMMENTlink written 8.5 years ago by Tim_Yates110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1993 users visited in the last hour