What will be the impact of strandedness on differentially expressed genes?
1
0
Entering edit mode
4.9 years ago

Hi,

I have RNAseq samples (paired-end FASTQ) with their sequencing kit name but I don't have the information about the strandedness. Not sure what approach was used to capture the RNA (strand-specific or non-stranded approach).

I performed two different analyses,

1) Assuming stranded "reverse" for hisat2, htseqcount (output: Differentially Expressed genes around 850)

2) Assuming nonstranded for hisat2, htseqcount (output: Differentially Expressed genes around 1100)

- 94% of genes from stranded approach are matching with genes from non-stranded approach.
- 6% of genes from stranded approach are not matching with genes from non-stranded appraoch.

Later, I encountered this RSeQC tool for identifying the strandedness.

Output of that tool:

- Fraction of reads failed to determine : 0.0580
- Fraction of reads explained by "1++,1--,2+-,2-+" : 0.4724
- Fraction of reads explained by "1+-,1-+,2++,2--" : 0.4695

I concluded this one to be : non-stranded. Am I correct?

Consider, if I proceed with the stranded approach ouput, is it a big blunder?

RNAseq stranded expression DE • 991 views
ADD COMMENT
1
Entering edit mode
4.9 years ago
h.mon 35k

You should use htseq-count with unstranded setting, as per rseqc your libraries are unstranded.

If you use htseq-count with stranded setting but unstranded libraries, reads mapping to overlapping features at opposite strands will be assigned unambiguously to each feature according to strand mapped. However, this assignment may be incorrect, because those reads could have originated from any of the overlapping features. When you use the correct unstranded setting, htseq-count will not count those reads to any feature, because it considers their assignment ambiguous.

The htseq manual has a nicer (and longer) explanation, here is the snippet for __ambiguous:

__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the --nonunique all option was used (set S had more than one element).

ADD COMMENT
0
Entering edit mode

Thanks for the comments h.mon. Yes the previous person in the team assumed it as stranded and did the analyses. But when I rerun the analyses, I figured it out to be non-stranded. Just want to reconfirm about the approach I did.

ADD REPLY

Login before adding your answer.

Traffic: 2714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6