Split BAM file from paired end read into forward and reverse strand w.r.t reference genome
1
0
Entering edit mode
4.2 years ago

Hello,

I have a paired end data and I have an alignment file (SAM/BAM). I want to split it into forward and reverse strand w.r.t reference genome. Looking at the SAM file, I am getting reads having 83,163,99 and 147 flags. So which flags I should consider for forward and reverse? Please help me out.

Thanks,

Susmita

RNA-Seq NGS SNP Variant Calling samtools • 3.8k views
ADD COMMENT
0
Entering edit mode
4.2 years ago
ATpoint 81k

Check this conversion tool for the flag to set when read/mate is reverse strand: http://broadinstitute.github.io/picard/explain-flags.html

ADD COMMENT
0
Entering edit mode

I did looked at it and many other web pages and I'm more confused now

ADD REPLY
0
Entering edit mode

It is quite simply actually once you get the logic. Flag 16 means the read is reverse strand.

Therefore:

samtools view -F 16 -o onlyForward.bam input.bam
samtools view -f 16 -o onlyReverse.bam input.bam

-F 16 means "do not use those that are reverse strand, so "use only forward", and -f 16 means "use reverse strand". samtools is smart, so these flags alone are sufficient to filter by strand. Your flags, e.g. 83 contain additional information but this is not required here. The minimal flags I provided should do the trick.

ADD REPLY
0
Entering edit mode

The thing is I have tried this command and I'm getting overlapped reads between the two strands and it should not happen. I'm doing allele specific variant calling. I should say the rna-seq is not strand-specific, would that be an issue?

ADD REPLY
0
Entering edit mode

What do you mean by "overlapping"? Well yeah, non-strand specific would be a problem for strand-specific analysis ;-)

ADD REPLY
0
Entering edit mode

Well I'm doing by creating personalized genomes (mouse strains). By overlapping I mean, lets take Xist and Tsix, antisense to each other, but most of the reads are coming from the same allele in these genes and other genes too. Could it be a case of allelle mapping bias?

ADD REPLY
0
Entering edit mode

Hmm, I think to answer this you really need a stranded library prep.

ADD REPLY
0
Entering edit mode

So there is no way to avoid that, in an unstranded RNA-Seq? :(

ADD REPLY
0
Entering edit mode

Are you sure this is unstranded data? Now a days people rarely do unstranded preps. Are you using publicly available data and are just not sure if the prep is stranded?

ADD REPLY
0
Entering edit mode

I am very sure. It's not publicly available data.

ADD REPLY

Login before adding your answer.

Traffic: 1902 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6