Question: Speed up Tophat2 running
1
gravatar for yuabrahamliu
18 months ago by
yuabrahamliu40
yuabrahamliu40 wrote:

Hi, all. I'm starting to use tophat2 to map dUTP strand-specific, paired-end RNA-seq data to mm9 genome. Each sample pair includes 2 samples about 500M large. However, I find that the run time is a bit too long. My commandline is like this:

tophat2 -o tophatoutput -p 28 --no-coverage-search --library-type=fr-firststrand --transcriptome-index=topHat/mm9 bowtie2/mm9 sample.R1.trimmed_1.fastq.gz sample.R2.trimmed_2.fastq.gz

I run the command from 7:30 pm and set the threads number to as high as 28, but now it is 10:30 pm, even one sample pair has not been completed. I have 6 sample pairs and it may take too long time in total. Is there something wrong with my command line? Or, can I do something to speed up the running. I will appreciate your suggestions. Below is the report message as the program is running. It is stuck in the step of Reporting output tracks now. Thank you.

[2017-11-17 19:27:08] Checking for Bowtie Bowtie version: 2.2.5.0 [2017-11-17 19:27:08] Checking for Bowtie index files (transcriptome).. Found both Bowtie1 and Bowtie2 indexes. [2017-11-17 19:27:08] Checking for Bowtie index files (genome).. [2017-11-17 19:27:08] Checking for reference FASTA file [2017-11-17 19:27:08] Generating SAM header for /HPCTMP_NOBKUP/wl314/data/other/ReadsMapIndexFiles/bowtie2/mm9 [2017-11-17 19:27:11] Reading known junctions from GTF file [2017-11-17 19:27:14] Preparing reads left reads: min. length=20, max. length=51, 20623862 kept reads (3111 discarded) right reads: min. length=20, max. length=51, 20619508 kept reads (7465 discarded) [2017-11-17 19:36:47] Using pre-built transcriptome data.. [2017-11-17 19:36:48] Mapping left_kept_reads to transcriptome mm9 with Bowtie2 [2017-11-17 19:43:03] Mapping right_kept_reads to transcriptome mm9 with Bowtie2 [2017-11-17 19:49:29] Resuming TopHat pipeline with unmapped reads [2017-11-17 19:49:29] Mapping left_kept_reads.m2g_um to genome mm9 with Bowtie2 [2017-11-17 20:11:09] Mapping left_kept_reads.m2g_um_seg1 to genome mm9 with Bowtie2 (1/2) [2017-11-17 20:14:28] Mapping left_kept_reads.m2g_um_seg2 to genome mm9 with Bowtie2 (2/2) [2017-11-17 20:18:38] Mapping right_kept_reads.m2g_um to genome mm9 with Bowtie2 [2017-11-17 20:40:32] Mapping right_kept_reads.m2g_um_seg1 to genome mm9 with Bowtie2 (1/2) [2017-11-17 20:44:06] Mapping right_kept_reads.m2g_um_seg2 to genome mm9 with Bowtie2 (2/2) [2017-11-17 20:48:40] Searching for junctions via segment mapping [2017-11-17 20:54:01] Retrieving sequences for splices [2017-11-17 20:55:42] Indexing splices Building a SMALL index [2017-11-17 20:56:11] Mapping left_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/2) [2017-11-17 20:58:59] Mapping left_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/2) [2017-11-17 21:02:31] Joining segment hits [2017-11-17 21:05:46] Mapping right_kept_reads.m2g_um_seg1 to genome segment_juncs with Bowtie2 (1/2) [2017-11-17 21:08:25] Mapping right_kept_reads.m2g_um_seg2 to genome segment_juncs with Bowtie2 (2/2) [2017-11-17 21:12:00] Joining segment hits

[2017-11-17 21:15:20] Reporting output tracks

rna-seq tophat2 dutp • 1.0k views
ADD COMMENTlink modified 18 months ago • written 18 months ago by yuabrahamliu40
3

I'm starting to use tophat2

Since you just started with Tophat, this is also the ideal moment to stop with it.

You should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink written 18 months ago by WouterDeCoster39k
1

It is not uncommon for these samples to run for a full day for large datasets. That is why you start these while going home and forget about it till next day.

ADD REPLYlink written 18 months ago by genomax67k
1

... or the whole weekend.

ADD REPLYlink written 18 months ago by michael.ante3.3k

Tophat usually needs a lot of time. The "report output tracks" step is often the most time consuming step and you don't see any progress...

If you need something faster, try bbmap it STAR.

ADD REPLYlink modified 18 months ago • written 18 months ago by michael.ante3.3k
1

Just to add to Michael's comment, just to be aware, the TopHat2/Cufflinks suite of programs (or 'Tuxedo' as they called it) are no longer supported. The replacements are HISAT2/StringTie ('New Tuxedo'). See their publication, here: https://www.nature.com/articles/nprot.2016.095

Coincidentally, I need a new tuxedo ('suit' in British/Irish English).

ADD REPLYlink written 18 months ago by Kevin Blighe42k

It really took about 6 hours to complete the first sample. Thank you.

ADD REPLYlink written 18 months ago by yuabrahamliu40

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it. Upvote|Bookmark|Accept

ADD REPLYlink written 18 months ago by WouterDeCoster39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1035 users visited in the last hour