Hi all,
I'm trying to find a way to split reads mapped with STAR into two separate bam-files, one for each strand. STAR is able to do that when creating bedgraph or bigwig outputs (--outWigType and --outWigStrand), however I can not find any option to split raw reads into two separate files correponding to the strand they were transcribed from. I do not want to get counts per gene, I know how to do that, instead I need to get actual reads for each strand. Any ideas?
Thanks a lot in advance
Based on strand of genome, or transcript?
Based on strand of genome
Why can't you split on the binary flags with samtools?
Because this will give me 50/50 split. 50% of all reads will map to plus strand and another 50% will map to minus strand, but what I care about is from which strand of the genome the transcript came from. May be I was not clear?
You can filter for both alignment direction and which read of the pair you are looking at.
Hmmm.. do you mean flags for first in pair and second in pair? Yea, I thought about scripting this, but decided to check if there is a ready solution. Is it such a rare thing to do that no available tool offer such a function?
I'm not sure why you need a script.
samtools view -f 80
will, for instance, get all the R1 reads that map to your genome in reverse. Isn't that one of the categories you want?Well... technically yes, but I probably want to check that R1 is indeed downstream of R2 by the coordinates with this filtering, just to be on the safe side. May be I'm overcomplicating things :)
Okay, won't f -82 do that for you, by demanding proper pairing? Is this a non-model organism, so you can't just look up the orientation of each gene, instead of deducing it from reads?
Sounds good! Thanks a lot for the ideas! Good discussion always helps :) I will do first filtering for -f 2 and then do further filtering steps on the outcome file and put it together.