STAR split reads into two strands
1
0
Entering edit mode
3.2 years ago
Alexander • 0

Hi all,

I'm trying to find a way to split reads mapped with STAR into two separate bam-files, one for each strand. STAR is able to do that when creating bedgraph or bigwig outputs (--outWigType and --outWigStrand), however I can not find any option to split raw reads into two separate files correponding to the strand they were transcribed from. I do not want to get counts per gene, I know how to do that, instead I need to get actual reads for each strand. Any ideas?

Thanks a lot in advance

rnachip stranded split STAR • 3.3k views
ADD COMMENT
0
Entering edit mode

Based on strand of genome, or transcript?

ADD REPLY
0
Entering edit mode

Based on strand of genome

ADD REPLY
0
Entering edit mode

Why can't you split on the binary flags with samtools?

ADD REPLY
0
Entering edit mode

Because this will give me 50/50 split. 50% of all reads will map to plus strand and another 50% will map to minus strand, but what I care about is from which strand of the genome the transcript came from. May be I was not clear?

ADD REPLY
1
Entering edit mode

You can filter for both alignment direction and which read of the pair you are looking at.

ADD REPLY
0
Entering edit mode

Hmmm.. do you mean flags for first in pair and second in pair? Yea, I thought about scripting this, but decided to check if there is a ready solution. Is it such a rare thing to do that no available tool offer such a function?

ADD REPLY
1
Entering edit mode

I'm not sure why you need a script. samtools view -f 80 will, for instance, get all the R1 reads that map to your genome in reverse. Isn't that one of the categories you want?

ADD REPLY
0
Entering edit mode

Well... technically yes, but I probably want to check that R1 is indeed downstream of R2 by the coordinates with this filtering, just to be on the safe side. May be I'm overcomplicating things :)

ADD REPLY
1
Entering edit mode

Okay, won't f -82 do that for you, by demanding proper pairing? Is this a non-model organism, so you can't just look up the orientation of each gene, instead of deducing it from reads?

ADD REPLY
0
Entering edit mode

Sounds good! Thanks a lot for the ideas! Good discussion always helps :) I will do first filtering for -f 2 and then do further filtering steps on the outcome file and put it together.

ADD REPLY
0
Entering edit mode
3.2 years ago
Divon ▴ 230

You can use my Genozip program:

genozip myfile.bam
genocat myfile.bam.genozip --FLAG -0x10 --output myfile.forward-strand-reads.bam
genocat myfile.bam.genozip --FLAG +0x10 --output myfile.reverse-strand-reads.bam

See here: https://genozip.com

Paper: https://www.researchgate.net/publication/349347156_Genozip_-_A_Universal_Extensible_Genomic_Data_Compressor

ADD COMMENT
0
Entering edit mode

Thanks. Hm... but how different is it from usng samtools tags? Would not it give me 50/50 split for all reads in the bam file?

ADD REPLY
0
Entering edit mode

It will give you the strand of the reference genome that your read maps to. For example, if your entire data is from a single RNA molecule, then it will all map to the same strand. Whether that is actually the strand your RNA came from in the DNA of the individual you sampled from, depends on whether their DNA matches the reference genome in that particular region.

ADD REPLY
0
Entering edit mode

It will give you the strand of the reference genome that your read maps to

Yes, but in paired end sequencing one read will always map to the opposite strand, regardless of the transcript origin, so I will get 50/50 split. What I care about is that both mates for read pairs that came from transcript produced in sense orientation would be labeled as originating from plus strand, and both mates from read pairs that came from anti-sense transcript would be labeled as originating from minus strand.

ADD REPLY

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6