RSeQC infer_experiment.py results interpretation for cuffdiff and featureCounts
2
0
Entering edit mode
6.9 years ago
tonja.r ▴ 600

I have RNAseq data from encode mouse and to do some analysis with cuffdiff and featureCounts I need first to understand what library type I have to chose the right parameters for cuffdiff and featureCounts.

I ran infer_experiment.py from RSeQC and found out following configuration:

This is SingleEnd Data
Fraction of reads failed to determine: 0.0023
Fraction of reads explained by "++,--": 0.0160
Fraction of reads explained by "+-,-+": 0.9817

So, I need to specify -library-type for cuffdiff. If I interpreted it right I have fr-secondstrand. Is it correct?

For featureCounts I need to specify -s parameter. It would be -s2, is it correct?

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.
RNA-Seq • 4.1k views
ADD COMMENT
1
Entering edit mode
6.8 years ago
iraun 5.6k

According to the results you have shown, your library type is fr-firstrand. I'd recommend you to read Tophat Library-Type : Illumina Truseq Stranded Total Rna Sample Prep Kit post in order to clarify what's is going on. If you still have questions or doubts you are welcome to ask again :).

Ah, fr-firstrand corresponds to -s 2 in featureCounts.

ADD COMMENT
0
Entering edit mode

How did you understand from the results above that the library type is fr-firststrand?

ADD REPLY
0
Entering edit mode

So, I did ask, how did you understand it from the results?

ADD REPLY
0
Entering edit mode

Hi, I got following results from paired end data:

This is PairEnd Data
Fraction of reads failed to determine: 0.0020
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0906
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9073

I believe, here also the featureCounts , -s option is 2

ADD REPLY
1
Entering edit mode
5.9 years ago
Yuka Takemon ▴ 40

12 months too late but I disagree with @iraun, Fraction of reads explained by "+-,-+": 0.9817 for a single end sequencing would be unstranded (fr-unstranded). If your reads were stranded, a higher percentage would be explained by "++,--" . Documentation isn't the clearest, but you can figure out the pattern of results here : http://rseqc.sourceforge.net/#infer-experiment-py

ADD COMMENT
2
Entering edit mode

Actually, this is not correct: the example given for single-end data on the RSeQC website may be a little confusing because it is the opposite of what you typically observe in Illumina stranded libraries (but there is a verbal description of what the strand code represent).

The counts are for reads that match the gene strand annotation or that have the opposite strand. An unstranded library will have a fraction close to 0.5 for both "++,--" and "+-,-+". dUTP Illumina stranded libraries (where the strand of the read is the opposite of the strand the gene annotation) will have "+-,-+" values close to 1.

You can visualize the alignment of reads to GAPDH in IGV to double-check (if you have single-end data, they will be colored by strand).

So, the answer from iraun is correct.

ADD REPLY
0
Entering edit mode

Oops you're right Charles. Thanks for correcting me.

ADD REPLY

Login before adding your answer.

Traffic: 1668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6