Question: Optimal parameters for Tophat on drosphila data
4.1 years ago by
Hi everyone !

This will be my first time using Tophat. I'm working on a drosophila genome, and as I read the manual before to start, I saw that there is a little recommendations :

Please note: TopHat has a number of parameters and options, and their default values are tuned for processing mammalian RNA-Seq reads. If you would like to use TopHat for another class of organism, we recommend setting some of the parameters with more strict, conservative values than their defaults. Usually, setting the maximum intron size to 4 or 5 Kb is sufficient to discover most junctions while keeping the number of false positives low.

As Drosophila genomes are very dense in genes, maybe there is an optimal parameters set. Does anyone have an idea of what I should modify to enhance my results ?

PS : The drosophila species I'm working on contains also a high level of polymorphism, so maybe it could be a problem. My main goal is to annotate a assembly I've made of this species. And in order to do that, I have to produces RNA-seq evidences coming from my data. That's why I'm using Tophat at this stage

Thanks for your help !


rna-seq assembly • 1.3k views
Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality (i.e. spliced alignment of RNA-Seq reads), in a more accurate and much more efficient way.

You just need to use any splice-aware aligner. Both STAR and BBMap are fast. BBMap is simple to use.

So, you are all suggesting me that Tophat2 is not appropriated for what I'm trying to do ?

It's not inappropriate. But it's not the best either, certainly not since it's deprecated.

I also heard that HISAT2 is going to be the new top-notch tool for RNA-seq Try it

Well, i'm actually downloading it, and I'm going to try. But maybe my questions is just the same right ? In the HISAT2 paper, they are talking about human genomes, so maybe the default parameters are just as well adapted for non gene dense genome. But Drosophila genome are very dense in genes, so I was looking for some advices about some parameters that can improve the analysis, taking count that the genome is gene dense (which for me means that all gene are very close from each other), because I know that sometimes it could be a problem, some genes are considered as one. I was looking for a way to avoid this. But I'm trying right now Hisat2, and I'm reading the manual also.

First thing you should always do is try the default settings. Then, as suggested by the tophat manual, you could limit the intron size. I think this will make the main difference between a gene dense and gene sparse genome.

