Hi everyone, I'm new to RNA-Seq data analysis and there're some problems about the strand specific information that confused me so much, any suggestions would be greatly appreciated here.
Say I have SOLiD RNA Sequencing pair end data, which is 50 x 35 bp, and the library is built with strand specificity. I use the Tophat to map reads with the parameter "--library-type fr-secondstrand" and get the accepted_hits.bam.
Now I want to see if there's transcripts transcribed from the antisense strand. I mean if a gene lie in the forward chromosome, I wanna see if there's some reads mapping to the reverse strand that could possibly be the transcripts transcribed reversely.
For this purpose I should extract the reads mapping to the two strands separately and then compare them. But I have some questions below:
- Is the SOLiD pair end, the F3/F5 reads in a pair mapping to the different strand? I mean if a gene lie in forward strand, is F3(+) and F5(-), and F3(-)/F5(+) mapping to the gene lie in reverse strand? I read the SOLiD protocol also examine my bam file in IGV confirmed it would be like this, but I also saw one thread http://seqanswers.com/forums/showthread.php?t=6317 the last post said the F3 and F5 reads in pair are actually on the same strand, so I'm not sure which one is correct. Any suggestion, discussion or comment will be welcomed. Thanks! 2.Although I used the parameter "--library-type" in Tophat mapping but I still don't know the manual's explanation about the 3 library type parameter. Anyone can explain clearly to me? Thanks.
3.It is said the XS:A tag indeed infers from which strand the read comes from. But in my data both the F3 and F5 reads are XS:A:+ if they were mapping to a gene lie in forward strand, with F3 shows + and F5 shows -. So I'm wondering the XS:A tag just told us the gene orientation, or I make some mistakes in some procedures?