I'd like to run TopHat2 on a 50bp single-end rna-seq dataset in order to get gene counts for differential expression analysis. I was going to run with --GTF ensembl_genes.gtf since the TopHat2 paper talks about how this leads to significant gains in sensitivity and accuracy.
What I'm wondering is - how will overlapping ensembl transcripts effect the results?
When TopHat generates a fasta from my ensembl gene file, it includes multiple overlapping sequences. It seems like these would lead to ambiguous alignment, and that I need to merge overlaps before running TopHat, but I'm not finding any discussion of this on the forum or in the papers, so wanted to double check.