I am working with Illumina HiSeq 2000 100bp single end RNA-seq data. Some of my samples originate from unstranded libraries and some from stranded libraries. I'm trying to understand the best way to do read summarisation for these libraries using featurecounts for eventual DGE analysis. To date I have treated all datasets as unstranded for mapping (tophat) and counting (featurecounts).
However I am fearful that read counts for my unstranded libraries will be biased for genes which have antisense transcripts (since reads originating from the antisense transcript will be fused into the counts for the gene on the sense strand in positions that the two features overlap). So what is the recommended course of action here? I'm not interested in antisense transcripts so should i continue to treat everything as unstranded for the featurecounts run? I have seen some other threads here that suggest incorporating strandedness into the DGE calculation as a multi-factorial design but was hoping for a more thorough explanation of how this is the better workaround for this problem.
Thank you in advance.