Dear Community,
I am trying to generate RNAseq "feature count" tables with htseq-count, and I have run into an issue regarding how to properly deal with stranded sequence data.
Background: Paired-end RNA libraries were prepared using a stranded protocol (Illumina TruSeq Stranded mRNA Sample Preparation Kit), and the reads were mapped to the reference genome using tophat2. I was provided the tophat2 accepted-hits.bam files, and asked to conduct differential expression analysis, which I was hoping to do via DESeq2 after first generating feature count tables using htseq-count.
Problem: I've learned that the tophat2 mapping was simply conducted using default settings, which means that the library type was assumed to be unstranded ("--library-type fr-unstranded"), and not stranded ("--library-type fr-firststrand").
Questions: Since the mappings were performed under the assumption that reads were unstranded, should I continue to assume likewise while running htseq-count (via "--stranded=no")? Or should I request that the data be remapped using the correct library type in tophat2, and only then proceed with htseq-count (using, I believe, "--stranded=reverse")?
Many thanks!