Question: samtools merge warning
0
gravatar for lshepard
2.1 years ago by
lshepard340
United States
lshepard340 wrote:

Hi,

I have a question regarding a warning that I am observing while merging bam files with samtools.

For simplicity here is the command I used for a single pair of files:

samtools merge merged.bam File1-sorted.bam File2-sorted.bam

Note that I simply added the 'sorted' on input files just to clarify that these files were previously sorted with samtools.

After running 'merge', the following warning is issued:

'Order of targets in file File2.bam caused coordinate sort to be lost'

Unfortunately, I am having a hard time finding more details about what this means, and how to avoid. I would appreciate any information. Thanks!

next-gen • 2.0k views
ADD COMMENTlink modified 2.1 years ago by Devon Ryan90k • written 2.1 years ago by lshepard340

what is the output of :

samtools view -H File2-sorted.bam | head
ADD REPLYlink written 2.1 years ago by Pierre Lindenbaum121k

Hi Pierre,

The output from your command is:

@HD     VN:1.0  SO:coordinate
@SQ     SN:chr1 LN:290094216
@SQ     SN:chr10        LN:112200500
@SQ     SN:chr10_AABR06110104_random    LN:1013
@SQ     SN:chr10_JH620367_random        LN:1765
@SQ     SN:chr10_AABR06110107_random    LN:780
@SQ     SN:chr10_AABR06110108_random    LN:4563
@SQ     SN:chr10_AABR06110109_random    LN:2250
@SQ     SN:chr10_AABR06110110_random    LN:2082
@SQ     SN:chr10_AABR06110111_random    LN:2352

Please, let me know if you need anything else. Thanks!

ADD REPLYlink written 2.1 years ago by lshepard340
1

SO:coordinate in first line of your BAM says that your file is coordinate-sorted (SO = Sort Order). The merging of two files will destroy this sorting, that's why samtools generates a warning. Don't worry, just re-sort the merged BAM and it should be fine.

ADD REPLYlink written 2.1 years ago by Santosh Anand4.9k

Hi Santosh, thanks for the clarification, I will re-sort the file. But does that mean that sorting before merging is not necessary? I always though sorting before merging was for the better, but this suggest that there should be two merging steps for my files?

Thanks again!

ADD REPLYlink written 2.1 years ago by lshepard340
1

Actually, sorting is required for merging. The point of merging is not just concatenating the two files, but to also preserve the sort and create a well-formatted header. From samtools manual: http://www.htslib.org/doc/samtools.html

Merge multiple sorted alignment files, producing a single sorted output file that contains all the input records and maintains the existing sort order.

If -h is specified the @SQ headers of input files will be merged into the specified header, otherwise they will be merged into a composite header created from the input headers. If in the process of merging @SQ lines for coordinate sorted input files, a conflict arises as to the order (for example input1.bam has @SQ for a,b,c and input2.bam has b,a,c) then the resulting output file will need to be re-sorted back into coordinate order.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Santosh Anand4.9k

That last sentence is the important one here.

ADD REPLYlink written 2.1 years ago by Devon Ryan90k

absolutely! that explains it all

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Santosh Anand4.9k

Perfect, that is what I thought, but wanted to confirm and make sure i didn't misunderstood.

ADD REPLYlink written 2.1 years ago by lshepard340
2
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

The issue is that your headers are in different orders or one file has chromosomes/contigs that the other doesn't. Consequently, while the input files might be nicely sorted, it's not immediately clear that the output will be properly sorted. As Santosh mentioned, you can just resort the merged file to fix this.

The bigger question is really how this happened to begin with. I presume you downloaded one of the files or that they in some way came from different sources. If you made both of these yourself, then either you used two different indices, or aligners that spit things out in different orders (that's not good) or something along those lines. If this is the case then it should be fixed because it'll cause you untold problems that you don't even know about yet.

ADD COMMENTlink written 2.1 years ago by Devon Ryan90k

Thanks for this clarification! Also I was wondering why the sort order is destroyed, though I guessed it based on the error message :)

ADD REPLYlink written 2.1 years ago by Santosh Anand4.9k

Hi Devon, thanks for the input. I will re-sort the files as suggested. Now, as to your question about how this happened in the first place: these files are the output from an Ion Torrent sequencing run, and unlike other platforms, the alignment suggested is a 'two step alignment' where you first perform an alignment with TopHat2 (may also use STAR) and use the unaligned reads to align with only Bowtie2 (using soft clipping local mode).

The index used was the same, I just noticed that the Bowtie2 output was unsorted (TopHat2 already sorted by coordinate), so I sorted all the files before merging. So I am assuming using the two step alignment might be the reason, but I am not sure how it may be avoided for this particular NGS platform. If you know anything, I would appreciate any info! :) Thanks!

ADD REPLYlink written 2.1 years ago by lshepard340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour