Hey Everyone,
I'm trying to figure out the appropriate TopHat settings for strand-specific, paired-end rna-seq data. I've read several posts about this but am still uncertain about the right settings. I'm hoping someone can confirm that my understanding of the sense/antisense, forward/reverse reads, is correct. As I understand it:
1) mRNA is an exact match to the DNA coding sequence (aside from U and no introns) and matches the sense strand
2) in library prep (TruSeq stranded in my case) the first strand of the cDNA library, which is antisense to the original gene, is used for sequencing, while the second strand is dUTP marked gets degraded
3) for paired end sequencing, after bridge PCR and sequencing the sense strand becomes read 1 (forward read) and the antisense strand becomes read 2 (reverse read).
So, when running TopHat, read 1 (/1) is used for the forward read and read 2 (/2) for the reverse read, with the library type set to fr=first strand?
My ultimate goal is to use the bam files from TopHat to get raw counts in HTseq, where I understand the appropriate library setting is "reverse". Btw, I'm doing all this in Galaxy, as I'm not very proficient at coding. Thanks in advance for any re-assurance.