Question: What will be the impact of strandedness on differentially expressed genes?
0
gravatar for bioinforesearchquestions
10 months ago by
United States
bioinforesearchquestions280 wrote:

Hi,

I have RNAseq samples (paired-end FASTQ) with their sequencing kit name but I don't have the information about the strandedness. Not sure what approach was used to capture the RNA (strand-specific or non-stranded approach).

I performed two different analyses,

1) Assuming stranded "reverse" for hisat2, htseqcount (output: Differentially Expressed genes around 850)

2) Assuming nonstranded for hisat2, htseqcount (output: Differentially Expressed genes around 1100)

- 94% of genes from stranded approach are matching with genes from non-stranded approach.
- 6% of genes from stranded approach are not matching with genes from non-stranded appraoch.

Later, I encountered this RSeQC tool for identifying the strandedness.

Output of that tool:

- Fraction of reads failed to determine : 0.0580
- Fraction of reads explained by "1++,1--,2+-,2-+" : 0.4724
- Fraction of reads explained by "1+-,1-+,2++,2--" : 0.4695

I concluded this one to be : non-stranded. Am I correct?

Consider, if I proceed with the stranded approach ouput, is it a big blunder?

stranded de expression rnaseq • 269 views
ADD COMMENTlink modified 10 months ago by h.mon29k • written 10 months ago by bioinforesearchquestions280
1
gravatar for h.mon
10 months ago by
h.mon29k
Brazil
h.mon29k wrote:

You should use htseq-count with unstranded setting, as per rseqc your libraries are unstranded.

If you use htseq-count with stranded setting but unstranded libraries, reads mapping to overlapping features at opposite strands will be assigned unambiguously to each feature according to strand mapped. However, this assignment may be incorrect, because those reads could have originated from any of the overlapping features. When you use the correct unstranded setting, htseq-count will not count those reads to any feature, because it considers their assignment ambiguous.

The htseq manual has a nicer (and longer) explanation, here is the snippet for __ambiguous:

__ambiguous: reads (or read pairs) which could have been assigned to more than one feature and hence were not counted for any of these, unless the --nonunique all option was used (set S had more than one element).

ADD COMMENTlink written 10 months ago by h.mon29k

Thanks for the comments h.mon. Yes the previous person in the team assumed it as stranded and did the analyses. But when I rerun the analyses, I figured it out to be non-stranded. Just want to reconfirm about the approach I did.

ADD REPLYlink written 9 months ago by bioinforesearchquestions280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1373 users visited in the last hour