Question

Using htseq-count when a stranded library was assumed to be unstranded during tophat2 mapping

0

Entering edit mode

8.4 years ago

cajawe • 0

Dear Community,

I am trying to generate RNAseq "feature count" tables with htseq-count, and I have run into an issue regarding how to properly deal with stranded sequence data.

Background: Paired-end RNA libraries were prepared using a stranded protocol (Illumina TruSeq Stranded mRNA Sample Preparation Kit), and the reads were mapped to the reference genome using tophat2. I was provided the tophat2 accepted-hits.bam files, and asked to conduct differential expression analysis, which I was hoping to do via DESeq2 after first generating feature count tables using htseq-count.

Problem: I've learned that the tophat2 mapping was simply conducted using default settings, which means that the library type was assumed to be unstranded (--library-type fr-unstranded), and not stranded (--library-type fr-firststrand).

Questions: Since the mappings were performed under the assumption that reads were unstranded, should I continue to assume likewise while running htseq-count (via --stranded=no)? Or should I request that the data be remapped using the correct library type in tophat2, and only then proceed with htseq-count (using, I believe, --stranded=reverse)?

Many thanks!

RNA-Seq htseq-count tophat2 alignment • 4.3k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.4 years ago by cajawe • 0

Ram · Accepted Answer · 2015-12-04

1

Entering edit mode

8.4 years ago

Istvan Albert 100k

My understanding based on documentation is that passing he --library-type parameter to tophat just adds an XS tag to each alignment.

This is tag is almost certainly not used by htseq since it is an aligner specific tag that is not guaranteed to be present for all aligners. So IMO you don't need to realign the data.

ADD COMMENT • link 8.4 years ago by Istvan Albert 100k

0

Entering edit mode

Thank you for the reply. So, if I'm understanding correctly, it should be possible to assume strandedness while applying htseq-count because, presumably, htseq-count is actually inspecting each read to determine strand, rather than relying on the tophat2 tag. Is that indeed what you're saying?

Thanks again!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by cajawe • 0

0

Entering edit mode

in the strand specific protocol reads are separated into file1 or file2 based on their strand. For example all reads in file2 are in the same orientation as the original transcript. So the tool will use only one of the files, the one that corresponds to the correct strand.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Istvan Albert 100k