I have just finished running count commands using HTSeq-Count and am about to begin DGE analysis with DESeq2 using the vignette as a guide. However, I found this blog post online which indicates that there may be issues if you do not take into account the fact that Illumina data produces antisense reads.
Having read the blog post a few times, I'm still a little confused. It seems that there is now an argument to the
preprocess.reads=invertStrand that allows you to take strandedness into account. Is this all I need to alter for my data? Or do I need to invert the strand using GenomicRanges? Outside of this, I did not take the strandedness into account with my alignment or counts so I am concerned that my raw count data may also be incorrect.
Here is what I ran for Tophat2 alignment:
tophat2 -G <GTF file> -p 4 -o <output folder> <index> <FASTQ file>
And here is what I ran for HTSeq-Count:
python -m HTSeq.scripts.count -s yes -a 10 <name sorted SAM> <GTF file> > <output>
As you can see, I did not take into account the library type for my tophat2 run. For HTSeq-Count I specified it was stranded but not to reverse it. So I really am not sure where this leaves me...
According the facility that did the sequencing, they used the TruSeq Stranded Total RNA kit. This experiment is single-read from mouse neural tissue (RIPSeq). Let me know if any more information is needed.
In summary, my questions:
- Is my count data okay as is?
- What do I need to do in DESeq2 to correct for this antisense issue?
Versions: Tophat 2.1.1 with Bowtie 2.2.9 and Samtools 0.1.19; HTSeq 0.6.1