Question: Ht-Seq Read Count And Strand-Specificity
gravatar for thecuriousbiologist
7.8 years ago by
United States
thecuriousbiologist490 wrote:


I am new to RNA sequencing and I am a bit confused with the HT-Seq read count options and I want to know whether I am thinking in the right direction. I have a set of paired-end strand-specific RNA-Seq reads and I am now trying to count the reads in a set of features (genes).

The HT-Seq documentation says that the option "stranded" by default is set to "yes" which means that HT-Seq assumes the reads to be strand-specific. They also say

"If your RNA-Seq data has not been made with a strand-specific protocol, this causes half of the reads to be lost. Hence, make sure to set the option --stranded=no unless you have strand-specific data! "

This makes sense, since if I use "stranded=yes" option for non-strand specific data, the reads mapping to the opposite strand of the feature will NOT be counted.

However, this makes me wonder, if I use "stranded=no" even for strand-specific data, it would not affect my counts in any way. Is that correct ? Because with "stranded=no", it does not matter if a read maps to the same or the opposite strand as the feature. It would be counted as long as it is mapping to a feature, regardless of the strand.

So then a follow up question comes to mind as to why HT-Seq even has the "stranded=yes" and "stranded=reverse" options.

I am sorry if this is a very naive and incorrect question, but I really need to get the strand-specific concept clear in my mind.

Any help would be much appreciated.

htseq read rna counts strand • 8.3k views
ADD COMMENTlink modified 7.8 years ago by Ido Tamir5.1k • written 7.8 years ago by thecuriousbiologist490
gravatar for Ido Tamir
7.8 years ago by
Ido Tamir5.1k
Ido Tamir5.1k wrote:

If the transcripts/genes whatever would not overlap you would get the same results whether you specify stranded=yes or stranded=no. But sometimes exons overlap (at least in mammals), and they do this in opposite directions which allows htseq-count to differentiate between the two genes/transcripts it the input was stranded. So you should see a higher rate of ambigous reads when using unstranded.

Depending on the protocol either the sense or the antisense strand gets sequenced, which makes the reverse option necessary. A not completeley illuminating figure (a little bit more colour would have been nice to see which strand gets sequenced: A not completely illuminating figureImage Credit: Zhao Zhang

And no its not naive. It is confusing and complicated with all these strands, protocols etc...

ADD COMMENTlink written 7.8 years ago by Ido Tamir5.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1715 users visited in the last hour