Question: Samtools Merge/Reheader Of Bam Files With Different @Sq Lines
gravatar for Nathan S. Watson-Haigh
6.1 years ago by
Nathan S. Watson-Haigh190 wrote:

I have a bunch of SAM/BAM files which have a different subset of reference sequences missing from the headers (@SQ) lines. samtools merge won't merge them because they have a mismatch in the @SQ lines. I thought specifying -h with a file containing the union of all @SQ lines should work, but it still complains about the mismatch in the @SQ lines of the BAM files being merged.I also tried using samtools reheader but was again unsuccessful!

Are there any concrete examples of using samtools merge or samtools reheader to merge BAM files where the @SQ lines of the BAM files to be merged have a different subset of @SQ lines missing?

Cheers, Nathan

ngs samtools • 5.6k views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 6.1 years ago by Nathan S. Watson-Haigh190
gravatar for Devon Ryan
6.1 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Hi Nathan, you'll not find concrete examples of that since it won't work with the current implementation of samtools. In brief, the header contains a chromosome/contig list and individual reads simply have a numeric index into this list, for use with viewing and sorting. In the current implementation, samtools grabs the header from one file and uses that for subsequent files, so the numeric index->chromosome/contig name pairing will be off for the remainder of your files. The same issues hold for the reheader command. It simply changes the header, without touching the reads (this is convenient if you need to change "chr1" into "1", but will completely ruin a file if you rearrange the order at all, since you might end up with "chr2" turning into "17"). Someone could implement a merge to handle your case, but I'm not aware of anyone having done so. What you might do is create a header that will work for all of your files and then "samtools view file.bam >>header.sam" all of the files together. You can then convert back to BAM and resort. This is pretty far from ideal, but the only way that I'm aware of that would work without reimplementing the samtools merge command, which would have some implementation ambiguities.

ADD COMMENTlink written 6.1 years ago by Devon Ryan92k

It is old but seems gold to me. I am doing exactly what @Devon Ryan pointed out. I am unaware if any new way is implemented, would be an asset to know if someone has done it.

ADD REPLYlink written 3.6 years ago by ivivek_ngs4.8k

There's been a lot of updates to how headers are managed in samtools merge. You might just give this a go with version 1.3 and see if it works :)

ADD REPLYlink written 3.6 years ago by Devon Ryan92k

definitely give it a try . Thanks for the information. Periodic updating of versions are not in fashion here , I was using the one in our cluster, yes definitely can download the version 1.3 and redirect the path in my .bashrc to work a way around. But for later projects.

ADD REPLYlink written 3.6 years ago by ivivek_ngs4.8k

Seems reasonable, though it's a shame that whoever manages the software is averse to providing versioned updated. I manage the software for our institute and provide a bunch of versions of most things, it's not that difficult.

ADD REPLYlink written 3.6 years ago by Devon Ryan92k

The problem lies in the bureaucracy of the institute. One IT guy managing entire resource and in no way the person gives a head to update the versions. We in our group do it most of the time however certain things need to be taken care by the admin which even on periodic tickets does not meet out and on yearly meeting the person just gets scot-free. Probably that is how things are working here. I do not mind updating versions and I do it for most tools except for some projects which are kept for a long time and need to be closed in a hurry. Bioinformaticians are not given charge of admin rights else life would have been much simpler.

ADD REPLYlink written 3.6 years ago by ivivek_ngs4.8k

@Devon Ryan, I actually encountered for another project the same problem, where all the headers in order of chr10 and so on, I was thinking to use the same thing as suggested here to extract the header from a corrected bam file and then use reheader for all the bam files in that order and then sort and index those bam files. But when I read the post of Pierre here I got a bit confused since the bam read alignment records themselves and their tid fields are unchanged as posted by you. So does that mean I should recreate everything from scratch? I usually do not create sam files and I always try to create bam files from alignment method sort them and index but since for this project all of them have wrong order so I have to reorder all the bam files and then sort and index. So creating a proper file and then using them to order all bam files is not the correct way, is it still wrong? I will not introduce any new line. Just when I extract the header from one of those bam file I will manually change it in correct order and then reheader all the bam files to correct one. According to the post it will be wrong? Since the read assignment is unchanged. Please correct me if am wrong?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by ivivek_ngs4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 3354 users visited in the last hour