Question: What will be the impact of strandedness on differentially expressed genes?
0
gravatar for bioinforesearchquestions
14 months ago by
United States
bioinforesearchquestions280 wrote:

Hi,

I have RNAseq samples (paired-end FASTQ) with their sequencing kit name but I don't have the information about the strandedness. Not sure what approach was used to capture the RNA (strand-specific or non-stranded approach).

I performed two different analyses,

1) Assuming stranded "reverse" for hisat2, htseqcount (output: Differentially Expressed genes around 850)

2) Assuming nonstranded for hisat2, htseqcount (output: Differentially Expressed genes around 1100)

- 94% of genes from stranded approach are matching with genes from non-stranded approach.
- 6% of genes from stranded approach are not matching with genes from non-stranded appraoch.

Later, I encountered this RSeQC tool for identifying the strandedness.

Output of that tool:

- Fraction of reads failed to determine : 0.0580
- Fraction of reads explained by "1++,1--,2+-,2-+" : 0.4724
- Fraction of reads explained by "1+-,1-+,2++,2--" : 0.4695

I concluded this one to be : non-stranded. Am I correct?

Consider, if I proceed with the stranded approach ouput, is it a big blunder?

stranded de expression rnaseq • 323 views
ADD COMMENTlink modified 14 months ago by h.mon31k • written 14 months ago by bioinforesearchquestions280
1
gravatar for h.mon
14 months ago by
h.mon31k
Brazil
h.mon31k wrote:

You should use htseq-count with unstranded setting, as per rseqc your libraries are unstranded.

If you use htseq-count with stranded setting but unstranded libraries, reads mapping to overlapping features at opposite strands will be assigned unambiguously to each feature according to strand mapped. However, this assignment may be incorrect, because those reads could have originated from any of the overlapping features. When you use the correct unstranded setting, htseq-count will not count those reads to any feature, because it considers their assignment ambiguous.

The htseq manual has a nicer (and longer) explanation, here is the snippet for __ambiguous:

__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the --nonunique all option was used (set S had more than one element).

ADD COMMENTlink written 14 months ago by h.mon31k

Thanks for the comments h.mon. Yes the previous person in the team assumed it as stranded and did the analyses. But when I rerun the analyses, I figured it out to be non-stranded. Just want to reconfirm about the approach I did.

ADD REPLYlink written 14 months ago by bioinforesearchquestions280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1173 users visited in the last hour