Question: RSeQC infer_experiment.py results interpretation for cuffdiff and featureCounts
0
gravatar for tonja.r
4.7 years ago by
tonja.r470
UK
tonja.r470 wrote:

I have RNAseq data from encode mouse and to do some analysis with cuffdiff and featureCounts I need first to understand what library type I have to chose the right parameters for cuffdiff and featureCounts.

I ran infer_experiment.py from RSeQC and found out following configuration:
 

This is SingleEnd Data

Fraction of reads failed to determine: 0.0023

Fraction of reads explained by "++,--": 0.0160

Fraction of reads explained by "+-,-+": 0.9817




So, I need to specify -library-type for cuffdiff. If I interpreted it right I have fr-secondstrand. Is it correct?
For featureCounts I need to specify -s parameter. It would be -s2, is it correct?


 

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.
rna-seq • 2.8k views
ADD COMMENTlink modified 3.7 years ago by Yuka Takemon40 • written 4.7 years ago by tonja.r470
1
gravatar for iraun
4.6 years ago by
iraun3.8k
Norway
iraun3.8k wrote:

According to the results you have shown, your library type is fr-firstrand. I'd recommend you to read Tophat Library-Type : Illumina Truseq Stranded Total Rna Sample Prep Kit post in order to clarify what's is going on. If you still have questions or doubts you are welcome to ask again :).

Ah, fr-firstrand corresponds to -s 2 in featureCounts.

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by iraun3.8k

How did you understand from the results above that the library type is fr-firststrand? 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by tonja.r470

So, I did ask, how did you understand it from the results?

ADD REPLYlink written 4.1 years ago by tonja.r470
1
gravatar for Yuka Takemon
3.7 years ago by
Yuka Takemon40
Canada/Vancouver/GenomeSciencesCentre
Yuka Takemon40 wrote:

12 months too late but I disagree with @iraun, Fraction of reads explained by "+-,-+": 0.9817 for a single end sequencing would be unstranded (fr-unstranded). If your reads were stranded, a higher percentage would be explained by "++,--" . Documentation isn't the clearest, but you can figure out the pattern of results here : http://rseqc.sourceforge.net/#infer-experiment-py

ADD COMMENTlink written 3.7 years ago by Yuka Takemon40
2

Actually, this is not correct: the example given for single-end data on the RSeQC website may be a little confusing because it is the opposite of what you typically observe in Illumina stranded libraries (but there is a verbal description of what the strand code represent).

The counts are for reads that match the gene strand annotation or that have the opposite strand. An unstranded library will have a fraction close to 0.5 for both "++,--" and "+-,-+". dUTP Illumina stranded libraries (where the strand of the read is the opposite of the strand the gene annotation) will have "+-,-+" values close to 1.

You can visualize the alignment of reads to GAPDH in IGV to double-check (if you have single-end data, they will be colored by strand).

So, the answer from iraun is correct.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Charles Warden7.9k

Oops you're right Charles. Thanks for correcting me.

ADD REPLYlink written 3.7 years ago by Yuka Takemon40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2205 users visited in the last hour