9 months ago by
University College London Cancer Institute
From my experience, it does not make much difference (if any at all) provided you are just looking to determine raw counts over known transcripts. For HT-seq, which uses a GTF/GFF reference over which abundances are counted from an aligned BAM, you may observe differences at genes where transcription occurs on both strands, like antisense transcripts (e.g. XIST and TSIX). For other programs, like Kallisto, which counts abundances over a FASTA reference transcriptome, I believe that no or absolute minimal differences in count abundances will be observed (and I have tested this).
A good aligner will be able to align all reads, irrespective, and record strand-specific information in the BAM file. This information can then be used by programs like Cufflinks in order to determine strand-specific count abundances, or indeed HT-seq to corroborate strand-specific information in the aligned BAM file with that in the reference GTF/GFF.
I did this recently for a bacterial RNA-seq project where transcription occurred virtually across the entire circular genome on both strands. Had I selected unstranded, in this case, I would observe roughly double count values and half the identified transcripts (and all would be identified on a single strand). The bacterial genome is obviously different from mammalian, though.
I would encourage you to read this great thread: How To Determine If Paired–End Illumina Rnaseq Reads Are Strand–Specific
The issue of strandedness in RNA-seq analysis is one that causes a lot of headaches, so, I'm sure that others have more to add.