Question

RSeQC infer_experiment.py results interpretation for cuffdiff and featureCounts

0

Entering edit mode

8.3 years ago

tonja.r ▴ 600

I have RNAseq data from encode mouse and to do some analysis with cuffdiff and featureCounts I need first to understand what library type I have to chose the right parameters for cuffdiff and featureCounts.

I ran infer_experiment.py from RSeQC and found out following configuration:

This is SingleEnd Data
Fraction of reads failed to determine: 0.0023
Fraction of reads explained by "++,--": 0.0160
Fraction of reads explained by "+-,-+": 0.9817

So, I need to specify -library-type for cuffdiff. If I interpreted it right I have fr-secondstrand. Is it correct?

For featureCounts I need to specify -s parameter. It would be -s2, is it correct?

-s <int>      Indicate if strand-specific read counting should be performed.
                  It has three possible values:  0 (unstranded), 1 (stranded) and
                  2 (reversely stranded). 0 by default.

RNA-Seq • 5.2k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by tonja.r ▴ 600

Ram · Answer 1 · 2016-02-15

1

Entering edit mode

8.2 years ago

iraun 6.2k

According to the results you have shown, your library type is fr-firstrand. I'd recommend you to read Tophat Library-Type : Illumina Truseq Stranded Total Rna Sample Prep Kit post in order to clarify what's is going on. If you still have questions or doubts you are welcome to ask again :).

Ah, fr-firstrand corresponds to -s 2 in featureCounts.

ADD COMMENT • link 8.2 years ago by iraun 6.2k

2

Entering edit mode

Hi, I got following results from paired end data:

This is PairEnd Data
Fraction of reads failed to determine: 0.0020
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0906
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9073

I believe, here also the featureCounts , -s option is 2

ADD REPLY • link 3.1 years ago by DareDevil ★ 4.3k

0

Entering edit mode

How did you understand from the results above that the library type is fr-firststrand?

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by tonja.r ▴ 600

0

Entering edit mode

So, I did ask, how did you understand it from the results?

ADD REPLY • link 7.7 years ago by tonja.r ▴ 600

score 1 · Answer 2 · 2017-01-13

1

Entering edit mode

7.3 years ago

Yuka Takemon ▴ 40

12 months too late but I disagree with @iraun, Fraction of reads explained by "+-,-+": 0.9817 for a single end sequencing would be unstranded (fr-unstranded). If your reads were stranded, a higher percentage would be explained by "++,--" . Documentation isn't the clearest, but you can figure out the pattern of results here : http://rseqc.sourceforge.net/#infer-experiment-py

ADD COMMENT • link 7.3 years ago by Yuka Takemon ▴ 40

2

Entering edit mode

Actually, this is not correct: the example given for single-end data on the RSeQC website may be a little confusing because it is the opposite of what you typically observe in Illumina stranded libraries (but there is a verbal description of what the strand code represent).

The counts are for reads that match the gene strand annotation or that have the opposite strand. An unstranded library will have a fraction close to 0.5 for both "++,--" and "+-,-+". dUTP Illumina stranded libraries (where the strand of the read is the opposite of the strand the gene annotation) will have "+-,-+" values close to 1.

You can visualize the alignment of reads to GAPDH in IGV to double-check (if you have single-end data, they will be colored by strand).

So, the answer from iraun is correct.

ADD REPLY • link 7.3 years ago by Charles Warden 8.2k

0

Entering edit mode

Oops you're right Charles. Thanks for correcting me.

ADD REPLY • link 7.3 years ago by Yuka Takemon ▴ 40