Question: Swap strands in BAM file format
gravatar for Bana
5.0 years ago by
Bana10 wrote:

Hi,  I hope someone can help me with this, I don't know what I am missing.

I would like to swap positive and negative strands in a BAM file. Below is the steps I am taking:

samtools view -H mybam.bam > header.sam
samtools view -h mybam.bam | awk -F ' ' '$2=($2=="16"?"0":"16")' > Swapped.bam
samtools reheader header.sam Swapped.bam or samtools reheader header.sam Swapped.bam > newBam.bam

But when I want to see the file, I get: 

[bam_header_read] EOF marker is absent. The input is probably truncated.
[main_samview] truncated file.

Any ideas are appreciated! 

alignment sequence • 1.6k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 5.0 years ago by Bana10
gravatar for Istvan Albert
5.0 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

First don't do that - it sounds like a situation where you are trying to fix something else and your "solution" is to create a bam file with wrong alignment information ... pain and suffering lies that way.

Second if you do it (even though you shouldn't) why are you printing the header in the samtools view? And you should split by tabs '\t'.

Third if you do it (even though you shouldn't) make sure to negate the fourth bit rather than replacing 16 with 0.  Awk is probably not the right tool since it does not offer bitwise operations.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Istvan Albert ♦♦ 81k

1- Thanks! the reason I want to do this (Hope i can explain it correctly) is that when I want to see the gene regions in UCSC genome browser, my Bam files are in opposite strand. or when i do calculations on them I have to look at the values for the opposite strand.. hence, a suggestion was to just change the strands.. 
2- will use '\t'. I get the header by samtools as It was giving me error that there is no header, and hence, I get the header from original file and then put it in the newbamfile at the end..
3-  I also have tried and($2,0x10) = would you mean something like this with negating the fourth bit? 

Any suggestions for any replacement to awk? I can also go ahead without this step, (just always have to be sure to consider the opposite strand which sounds like more confusion down the road than changing the strands..)

Forgive my naiveness, if my steps don't make sense, I am trying to understand these Bam files and your suggestions would be great to have!

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Bana10

As I suspected that is absolutely not a valid reason to change your bam file and make into an invalid one!   

What you probably mean is that your data is strand specific with a a library prep that sequences the complementary strand. That is fine. Tools are built to operate on that and once you assemble a transcript  it will be in the correct orientation. 

As for the display it is not a big deal. If your instruments sequences the complementary strand you should never just "choose" to present that as the leading strand because it "looks nicer" that way - that is not what the instrument measured - it amounts to data doctoring.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Istvan Albert ♦♦ 81k

I see. yes, thats what I meant. Thank you for the explanation! :)

ADD REPLYlink written 5.0 years ago by Bana10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2006 users visited in the last hour