I have two questions regarding running Tophat:
(1) At the step “Searching for junctions via segment mapping”, it takes a really long time, and I got the following message:
“Coverage-search algorithm is turned on, making this step very slow Please try running TopHat again with the option (--no-coverage-search) if this step takes too much time or memory.”
I’d like to know what exactly the differences between “coverage-search” and “no coverage-search” are. If I use “--no-coverage-search” option, what impact it may have on the Tophat results and accuracy?
(2) I use –G option to provide gene model annotation GTF file (genes.gtf). I notice that for each Tophat run, it builds bowtie index for genes.gtf on-the-fly:
“Building Bowtie index from genes.fa”
This step takes two hours (I use the main annotation gtf file for human from GENCODE).
I have 3 conditions and each condition has 3-4 pairs of fastq reads, so I have 10 Tophat runs in my script. This “Building Bowtie index from genes.fa” step was executed for 10 times even though they all use the same genes.gtf file.
So I am wondering if there is a way to let Tophat re-use the bowtie index for genes.gtf produced from the first run in the subsequent runs?
I’d greatly appreciate any ideas and suggestions.
Thank you very much!