Question

Can Tophat Deal With Overlapping Transcripts When Given --Gtf ?

0

Entering edit mode

10.5 years ago

bw. ▴ 260

I'd like to run TopHat2 on a 50bp single-end rna-seq dataset in order to get gene counts for differential expression analysis. I was going to run with --GTF ensembl_genes.gtf since the TopHat2 paper talks about how this leads to significant gains in sensitivity and accuracy. What I'm wondering is - how will overlapping ensembl transcripts effect the results?
When TopHat generates a fasta from my ensembl gene file, it includes multiple overlapping sequences. It seems like these would lead to ambiguous alignment, and that I need to merge overlaps before running TopHat, but I'm not finding any discussion of this on the forum or in the papers, so wanted to double check.

Thanks -Ben

tophat gtf • 2.5k views

ADD COMMENT • link 10.5 years ago by bw. ▴ 260

score 0 · Answer 1 · 2013-10-27

0

Entering edit mode

10.5 years ago

Sean Davis 26k

TopHat deals with this situation just fine. In fact, at the mapping stage, overlapping transcripts are not really a big problem. However, such overlapping transcripts are harder to deal with at the quantification step (after alignment). This is what cufflinks and other quantification softwares try to deal with.

ADD COMMENT • link 10.5 years ago by Sean Davis 26k

0

Entering edit mode

That is not entirely true. Tophat may mispredict novel splice junctions by choosing a splicing motif from the wrong strand.

ADD REPLY • link 10.5 years ago by Gww ★ 2.7k

0

Entering edit mode

TopHat is certainly designed with the problem of overlapping transcripts as well as antisense transcripts in mind, but you are certainly correct that TopHat can and does produce false positive (and false negative) results.

ADD REPLY • link 10.5 years ago by Sean Davis 26k