Manually edit the flags in a sam file
1
0
Entering edit mode
8.6 years ago
DVA ▴ 630

Hello,

I have a question about editing the flag number in sam/bam files. For my particular project, I will need to manually edit the flag after alignment, mainly because the orientation of reads is sometimes opposite as it should be. For example, the aligner may give a flag of 99, which means 5'-3' orientation (on the Waston (positive) strand); but in fact it should be 147, which means 3'-5'.

I think I could do it in binary for bam, because orientation is only one binary numbers for flag; but I'm not sure how to do it directly in sam since the number is combined with all other situations (e.g. 1 or 2 pair, mapped/unmapped, etc...). Any one has some suggestions please?

Thank you very much in advance for your attention:)

sam • 3.6k views
ADD COMMENT
1
Entering edit mode

It should not be very difficult with a simple script. Get an idea on how to split the combined number. http://www.samformat.info/#/flag

This post might help you as well to get an idea: Tool to unmark duplicates Its not exactly same but deals with modifying samflags

ADD REPLY
0
Entering edit mode

Thank you. I've been using https://broadinstitute.github.io/picard/explain-flags.html for the same purpose. I guess I will need to write a script for this. Appreciate your help.

ADD REPLY
2
Entering edit mode
8.6 years ago

You end up not being able to directly modify either file, since in the case of BAM files everything is compressed and the modified file size is likely to be ever so slightly different and in the case of SAM files for the reason you mentioned. Just write a little script using pysam (or even just use awk) and pipe everything through to fix the flags.

BTW, what program produced incorrect flags? I'll make a note to avoid it.

ADD COMMENT
0
Entering edit mode

Thank you @Devon Ryan. I'm working on the script:p It's not the program's fault to give incorrect flags though - it's because of our sequencing method.

ADD REPLY
0
Entering edit mode

Hmm, that's quite odd. Normally aligners for BSseq data do an in silico conversion before alignment, so it shouldn't matter how much of the sample was actually bisulfite treated.

Out of curiosity, are you allowed to say why you only 20% treated the sample?

ADD REPLY
0
Entering edit mode

To my knowledge, for bsseq, aligners know which strand is positive by looking at if Cs are all (or mostly) changed into Ts; and know which is negative if Gs are all changed into As. (This is just to my understanding, so please correct me if I'm wrong)

ADD REPLY
1
Entering edit mode

I've written a BSseq aligner so I guess I'll correct you a bit :)

How most aligners work internally is that they will convert all Cs to Ts in read 1 and Gs to As in read 2 and align that to a C->T or G->A converted genome (depending on the library type, there may also be a G->A conversion of read 1 and a C->T conversion of read 2). Thus, it doesn't really matter how converted your fragments are, the aligner will completely convert everything (regardless of whether a base was methylated or not) prior to alignment. Consequently, the strand should be correct regardless.

I should note that a few aligners try to just not count C->T differences as a mismatch, but that inevitably leads to bias and such aligners should be avoided.

ADD REPLY
0
Entering edit mode

(sorry I deleted an earlier reply) Thanks a lot for your clarification. We are using BSMAP, which does use the C-T tolerant algorithm you mentioned. However, I'm not sure how this would lead to orientation error though. I'm going to look more into both your paper and theirs.

ADD REPLY

Login before adding your answer.

Traffic: 1897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6