RSeQC Output from infer_experiment.py - what does it mean?
1
8
Entering edit mode
6.2 years ago
JJ ▴ 670

Hi all,

Can someone help me understand the RSeQC Output from infer_experiment.py?

So this is the output:

This is PairEnd Data
Fraction of reads failed to determine: 0.0560
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0192
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9247

So it's stranded but is it fr-firststrand or fr-secondstrand? I do not understand the help given here:

For pair-end RNA-seq, there are two different ways to strand reads (such as Illumina ScriptSeq protocol):

1++,1–,2+-,2-+

read1 mapped to ‘+’ strand indicates parental gene on ‘+’ strand

read1 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

read2 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand

read2 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

1+-,1-+,2++,2–

read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand

read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand

read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand

read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand

Thanks for your help!

RNA-Seq • 9.6k views
ADD COMMENT
14
Entering edit mode
6.2 years ago

It means you have a standard (dUTP-based) strand-specific library. If you want to use featureCounts, you'll want the -s 2 setting. For HTSeq-count it's --strand reverse.

ADD COMMENT
0
Entering edit mode

Thanks! I would like to use HTSeq-count but also Stringtie, which takes either

--rf    Assumes a stranded library fr-firststrand.
--fr    Assumes a stranded library fr-secondstrand.

So it's fr-firststrand, correct?

Could you quickly explain what "1+-,1-+,2++,2--" means?

Thanks!

ADD REPLY
3
Entering edit mode

Yes, you want --rf, which matches dUTP based methods (the documentation for those options is horrendous).

1+- means that read 1 mapped to the + strand when the gene itself was on the - strand. For dUTP-based methods, read to aligns in the same direction as the transcript from which the sequenced fragment arose ("read 2 sets the strand"), so you expect "2++" and "2--" to get more signal than "2+-" and "2-+".

ADD REPLY
0
Entering edit mode

Hi Devon,

So if 1++,1--,2+-,2-+" get more signal (Value) then its fr-secondstrand

and if 1+-,1-+,2++,2-- get more signal (Value) then its fr-firststrand

Am I understand right?

ADD REPLY
0
Entering edit mode

Yup. You don't see much "fr-secondstrand" data these days.

ADD REPLY
0
Entering edit mode

Thanks - now I get it! One more question concerning HISAT2 and Stringtie. Do I have to set this tag when using HISAT2 in order for Stringtie (and other tools for that matter!) to work properly with strand-specific data?

--rna-strandness <string>

Specify strand-specific information: the default is unstranded.
For single-end reads, use F or R. 'F' means a read corresponds to a transcript. 'R' means a read corresponds to the reverse complemented counterpart of a transcript. For paired-end reads, use either FR or RF.
With this option being used, every read alignment will have an XS attribute tag: '+' means a read belongs to a transcript on '+' strand of genome. '-' means a read belongs to a transcript on '-' strand of genome.

The mapping should be the same apart from the XS tag, shouldn't it? Is this tag important for all the downstream tools? Thank you very much for your help!

ADD REPLY
1
Entering edit mode

It ends up not mattering much for alignments unless you're supplying a GTF file. For stringTie, it's quite useful, since then the resulting transcripts can have a meaningful strand assigned to them (makes doing things like finding ORFs and performing annotations a bit simpler).

ADD REPLY

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6