Question: Using htseq-count when a stranded library was assumed to be unstranded during tophat2 mapping
gravatar for cajawe
3.8 years ago by
cajawe0 wrote:

Dear Community,

I am trying to generate RNAseq "feature count" tables with htseq-count, and I have run into an issue regarding how to properly deal with stranded sequence data.

Background: Paired-end RNA libraries were prepared using a stranded protocol (Illumina TruSeq Stranded mRNA Sample Preparation Kit), and the reads were mapped to the reference genome using tophat2.  I was provided the tophat2 accepted-hits.bam files, and asked to conduct differential expression analysis, which I was hoping to do via DESeq2 after first generating feature count tables using htseq-count.  

Problem: I've learned that the tophat2 mapping was simply conducted using default settings, which means that the library type was assumed to be unstranded ("--library-type fr-unstranded"), and not stranded ("--library-type fr-firststrand").  

Questions: Since the mappings were performed under the assumption that reads were unstranded, should I continue to assume likewise while running htseq-count (via "--stranded=no")?  Or should I request that the data be remapped using the correct library type in tophat2, and only then proceed with htseq-count (using, I believe,  "--stranded=reverse")? 

Many thanks!

ADD COMMENTlink modified 3.8 years ago by Istvan Albert ♦♦ 81k • written 3.8 years ago by cajawe0
gravatar for Istvan Albert
3.8 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

My understanding based on documentation is that passing he --library-type parameter to tophat just adds an XS tag to each alignment.

This is tag is almost certainly not used by htseq since it is an aligner specific tag that is not guaranteed to be present for all aligners. So IMO you don't need to realign the data.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Istvan Albert ♦♦ 81k

Thank you for the reply.  So, if I'm understanding correctly, it should be possible to assume strandedness while applying htseq-count because, presumably, htseq-count is actually inspecting each read to determine strand, rather than relying on the tophat2 tag.  Is that indeed what you're saying?

Thanks again!

ADD REPLYlink written 3.8 years ago by cajawe0

in the strand specific protocol reads are separated into file1 or file2 based on their strand.  For example all reads in file2 are in the same orientation as the original transcript. So the tool will use only one of the files, the one that corresponds to the correct strand.

ADD REPLYlink written 3.8 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1694 users visited in the last hour