Question: Confused About Solid Rna-Seq Pair End Data
gravatar for Conan
8.6 years ago by
Conan20 wrote:

Hi everyone, I'm new to RNA-Seq data analysis and there're some problems about the strand specific information that confused me so much, any suggestions would be greatly appreciated here.

Say I have SOLiD RNA Sequencing pair end data, which is 50 x 35 bp, and the library is built with strand specificity. I use the Tophat to map reads with the parameter "--library-type fr-secondstrand" and get the accepted_hits.bam.

Now I want to see if there's transcripts transcribed from the antisense strand. I mean if a gene lie in the forward chromosome, I wanna see if there's some reads mapping to the reverse strand that could possibly be the transcripts transcribed reversely.

For this purpose I should extract the reads mapping to the two strands separately and then compare them. But I have some questions below:

  1. Is the SOLiD pair end, the F3/F5 reads in a pair mapping to the different strand? I mean if a gene lie in forward strand, is F3(+) and F5(-), and F3(-)/F5(+) mapping to the gene lie in reverse strand? I read the SOLiD protocol also examine my bam file in IGV confirmed it would be like this, but I also saw one thread the last post said the F3 and F5 reads in pair are actually on the same strand, so I'm not sure which one is correct. Any suggestion, discussion or comment will be welcomed. Thanks! 2.Although I used the parameter "--library-type" in Tophat mapping but I still don't know the manual's explanation about the 3 library type parameter. Anyone can explain clearly to me? Thanks.

3.It is said the XS:A tag indeed infers from which strand the read comes from. But in my data both the F3 and F5 reads are XS:A:+ if they were mapping to a gene lie in forward strand, with F3 shows + and F5 shows -. So I'm wondering the XS:A tag just told us the gene orientation, or I make some mistakes in some procedures?

solid • 4.5k views
ADD COMMENTlink written 8.6 years ago by Conan20
gravatar for Istvan Albert
8.6 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

The names F3 F5 indicate where the fragments come from. In this protocol both mates come from the same strand.

You might also want to read this:

Note that the most common paired-end protocols produce F3, R3 reads and most tools expect that. It is almost certain that you would need to make sure that the tool supports the data in your format and then you have to explicitly invoke this custom behavior. The tools cannot detect this.

Alternatively you can just reverse complement your color space reads (that means reverting the colors) alas this too has some implications that can bite later on.

Long story short: kind of tedious.

ADD COMMENTlink modified 8.6 years ago • written 8.6 years ago by Istvan Albert ♦♦ 86k

Thank you for the excellent explanation Albert, as I figured out the F3 and F5 it seems that my mapping result has something strange: As I said I use Tophat to map SOLiD RNA-Seq PE data with --library-type fr-secondstrand, then I use IGV to check out the accepted.bam, most of the properly paired reads shows: F3 start alignment (+) and F5 start alignment (-) if they were mapped to a gene lie on the forward strand. I also grep a specific properly paired reads to see the FLAG which is 99 and 147, means the read second in pair is mapped to the reverse strand. So I still need to be make sure that:

Isn't the alignment +/- in IGV means the positive strand and negative strand the reads mapped to? So this is contradictory with the "F3 and F5 fragments come from the same strand". So I don't know is myself stuck into the chaos or that's my data's problem.

ADD REPLYlink written 8.6 years ago by Conan20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1019 users visited in the last hour