STAR split reads into two strands
1
0
Entering edit mode
13 months ago
Alexander • 0

Hi all,

I'm trying to find a way to split reads mapped with STAR into two separate bam-files, one for each strand. STAR is able to do that when creating bedgraph or bigwig outputs (--outWigType and --outWigStrand), however I can not find any option to split raw reads into two separate files correponding to the strand they were transcribed from. I do not want to get counts per gene, I know how to do that, instead I need to get actual reads for each strand. Any ideas?

Thanks a lot in advance

rnachip stranded split STAR • 1.6k views
0
Entering edit mode

Based on strand of genome, or transcript?

0
Entering edit mode

Based on strand of genome

0
Entering edit mode

Why can't you split on the binary flags with samtools?

0
Entering edit mode

Because this will give me 50/50 split. 50% of all reads will map to plus strand and another 50% will map to minus strand, but what I care about is from which strand of the genome the transcript came from. May be I was not clear?

1
Entering edit mode

You can filter for both alignment direction and which read of the pair you are looking at.

0
Entering edit mode

Hmmm.. do you mean flags for first in pair and second in pair? Yea, I thought about scripting this, but decided to check if there is a ready solution. Is it such a rare thing to do that no available tool offer such a function?

1
Entering edit mode

I'm not sure why you need a script. samtools view -f 80 will, for instance, get all the R1 reads that map to your genome in reverse. Isn't that one of the categories you want?

0
Entering edit mode

Well... technically yes, but I probably want to check that R1 is indeed downstream of R2 by the coordinates with this filtering, just to be on the safe side. May be I'm overcomplicating things :)

1
Entering edit mode

Okay, won't f -82 do that for you, by demanding proper pairing? Is this a non-model organism, so you can't just look up the orientation of each gene, instead of deducing it from reads?

0
Entering edit mode

Sounds good! Thanks a lot for the ideas! Good discussion always helps :) I will do first filtering for -f 2 and then do further filtering steps on the outcome file and put it together.

0
Entering edit mode
13 months ago
Divon ▴ 180

You can use my Genozip program:

genozip myfile.bam
genocat myfile.bam.genozip --FLAG -0x10 --output myfile.forward-strand-reads.bam
genocat myfile.bam.genozip --FLAG +0x10 --output myfile.reverse-strand-reads.bam


See here: https://genozip.com

0
Entering edit mode

Thanks. Hm... but how different is it from usng samtools tags? Would not it give me 50/50 split for all reads in the bam file?

0
Entering edit mode

It will give you the strand of the reference genome that your read maps to. For example, if your entire data is from a single RNA molecule, then it will all map to the same strand. Whether that is actually the strand your RNA came from in the DNA of the individual you sampled from, depends on whether their DNA matches the reference genome in that particular region.

0
Entering edit mode

It will give you the strand of the reference genome that your read maps to

Yes, but in paired end sequencing one read will always map to the opposite strand, regardless of the transcript origin, so I will get 50/50 split. What I care about is that both mates for read pairs that came from transcript produced in sense orientation would be labeled as originating from plus strand, and both mates from read pairs that came from anti-sense transcript would be labeled as originating from minus strand.