Question: Manually edit the flags in a sam file
gravatar for DVA
3.7 years ago by
United States
DVA520 wrote:


I have a question about editing the flag number in sam/bam files. For my particular project, I will need to manually edit the flag after alignment, mainly because the orientation of reads is sometimes opposite as it should be. For example, the aligner may give a flag of 99, which means 5'-3' orientation (on the Waston (positive) strand); but in fact it should be 147, which means 3'-5'. 

I think I could do it in binary for bam, because orientation is only one binary numbers for flag; but I'm not sure how to do it directly in sam since the number is combined with all other situations (e.g. 1 or 2 pair, mapped/unmapped, etc...). Any one has some suggestions please?

Thank you very much in advance for your attention:) 

sam • 1.7k views
ADD COMMENTlink modified 3.7 years ago by Devon Ryan90k • written 3.7 years ago by DVA520

It should not be very difficult with a simple script. Get an idea on how to split the combined number.

This post might help you as well to get an idea: Tool to unmark duplicates Its not exactly same but deals with modifying samflags

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by geek_y9.7k

Thank you. I've been using for the same purpose. I guess I will need to write a script for this. Appreciate your help.

ADD REPLYlink written 3.7 years ago by DVA520
gravatar for Devon Ryan
3.7 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

You end up not being able to directly modify either file, since in the case of BAM files everything is compressed and the modified file size is likely to be ever so slightly different and in the case of SAM files for the reason you mentioned. Just write a little script using pysam (or even just use awk) and pipe everything through to fix the flags.

BTW, what program produced incorrect flags? I'll make a note to avoid it.

ADD COMMENTlink written 3.7 years ago by Devon Ryan90k

Thank you @Devon Ryan. I'm working on the script:p It's not the program's fault to give incorrect flags though - it's because of our sequencing method.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by DVA520

Hmm, that's quite odd. Normally aligners for BSseq data do an in silico conversion before alignment, so it shouldn't matter how much of the sample was actually bisulfite treated.

Out of curiosity, are you allowed to say why you only 20% treated the sample?

ADD REPLYlink written 3.7 years ago by Devon Ryan90k

To my knowledge, for bsseq, aligners know which strand is positive by looking at if Cs are all (or mostly) changed into Ts; and know which is negative if Gs are all changed into As. (This is just to my understanding, so please correct me if I'm wrong)

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by DVA520

I've written a BSseq aligner so I guess I'll correct you a bit :)

How most aligners work internally is that they will convert all Cs to Ts in read 1 and Gs to As in read 2 and align that to a C->T or G->A converted genome (depending on the library type, there may also be a G->A conversion of read 1 and a C->T conversion of read 2). Thus, it doesn't really matter how converted your fragments are, the aligner will completely convert everything (regardless of whether a base was methylated or not) prior to alignment. Consequently, the strand should be correct regardless.

I should note that a few aligners try to just not count C->T differences as a mismatch, but that inevitably leads to bias and such aligners should be avoided.

ADD REPLYlink written 3.7 years ago by Devon Ryan90k

(sorry I deleted an earlier reply) Thanks a lot for your clarification. We are using BSMAP, which does use the C-T tolerant algorithm you mentioned. However, I'm not sure how this would lead to orientation error though. I'm going to look more into both your paper and theirs. 

ADD REPLYlink written 3.7 years ago by DVA520
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1638 users visited in the last hour