Question

What will be the impact of strandedness on differentially expressed genes?

0

Entering edit mode

4.7 years ago

bioinforesearchquestions ▴ 370

Hi,

I have RNAseq samples (paired-end FASTQ) with their sequencing kit name but I don't have the information about the strandedness. Not sure what approach was used to capture the RNA (strand-specific or non-stranded approach).

I performed two different analyses,

1) Assuming stranded "reverse" for hisat2, htseqcount (output: Differentially Expressed genes around 850)

2) Assuming nonstranded for hisat2, htseqcount (output: Differentially Expressed genes around 1100)

- 94% of genes from stranded approach are matching with genes from non-stranded approach.
- 6% of genes from stranded approach are not matching with genes from non-stranded appraoch.

Later, I encountered this RSeQC tool for identifying the strandedness.

Output of that tool:

- Fraction of reads failed to determine : 0.0580
- Fraction of reads explained by "1++,1--,2+-,2-+" : 0.4724
- Fraction of reads explained by "1+-,1-+,2++,2--" : 0.4695

I concluded this one to be : non-stranded. Am I correct?

Consider, if I proceed with the stranded approach ouput, is it a big blunder?

RNAseq stranded expression DE • 953 views

ADD COMMENT • link updated 4.7 years ago by h.mon 35k • written 4.7 years ago by bioinforesearchquestions ▴ 370

score 1 · Answer 1 · 2019-08-07

You should use htseq-count with unstranded setting, as per rseqc your libraries are unstranded.

If you use htseq-count with stranded setting but unstranded libraries, reads mapping to overlapping features at opposite strands will be assigned unambiguously to each feature according to strand mapped. However, this assignment may be incorrect, because those reads could have originated from any of the overlapping features. When you use the correct unstranded setting, htseq-count will not count those reads to any feature, because it considers their assignment ambiguous.

The htseq manual has a nicer (and longer) explanation, here is the snippet for __ambiguous:

__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the --nonunique all option was used (set S had more than one element).