Question

Alignment time of TopHat (v.2.1.1) increases while increasing threads (using multi-threading) instead of decreasing the alignment time why?

0

Entering edit mode

6.9 years ago

mbkmastkhel ▴ 10

I used TopHat (v. 2.1.1) to align RNA-Seq reads of zebrafish to a genome (Danio_rerio.GRCz10.pep.abinitio.fa) on a system with the following specifications:

System specifications:

Memory: 4GB RAM

Processor: Intel® Core™ i5-4590 CPU @ 3.30GHz × 4

OS: Ubuntu 16.04 LTS (64 bits)

Hardware: 64 bits Architecture

TopHat Command:

time tophat --solexa-quals -g 2 -p 1 --no-coverage-search -j annotation/Danio_rerio.Zv9.66.spliceSites -o tophatoutput/ZV9 genome/zebrafish data/SRR630464_1.fastq data/SRR630464_2.fastq

TopHat Command Parameters

TopHat command parameters that we used are listed below:

-g Maximum number of multi hits allowed. Short reads are likely to map to more than one location in the genome even though these reads can have originated from only one of these regions. In RNA-Seq we allow for a restricted number of multi hits, and in this case we ask Tophat to report only reads that map at most onto 2 different loci.

-p Use these many threads to align reads

--library-type Before performing any type of RNA-Seq analysis you need to know a few things about the library preparation. Was it done using a strand-specific protocol or not? If yes, which strand? In our data the protocol was NOT strand specific.

--no-coverage-search To reduce the time it takes and to reduce the memory.

-J Improve spliced alignment by providing Tophat with annotated splice junctions. Pre-existing genome annotation is an advantage when analyzing RNA-Seq data. This file contains the coordinates of annotated splice junctions from Ensemble. These are stored under the sub-directory annotation in a file called ZV9.spliceSites.

-O This specifies in which subdirectory Tophat should save the output files.

My question is by increasing number of threads means using multi-threading the time it takes to align the reads to genome will be reduced. while here as shown in the table given below by increasing number of threads, due to increasing number of threads the alignment time also increases instead of decreasing. only on two threads the alignment time decreases while on 4,8 and 16 threads the alignment time increases why?

Table

Files: FASTQ Files of Zebrafish accession number GSE42846. SRR630464_1.fastq/ SRR630464_2.fastq

Size: 4.2 GBs each file

Nucleotide Sequences: 24410561 Sequences have each file

1 Thread : User time is 137 min 0 sec and system time is 5 min 12 sec* and **total time is 142 min 12 sec

2 Threads: User time is 120 min 33 sec and system time is 6 min 40 sec and total time is 127 min 13 sec

4 Threads: User time is 124 min 20 sec and system time is 19 min 54 sec and total time is 144 min 14 sec

8 Threads: User time is 122 min 41 sec and system time is 18 min 31 sec and total time is 141 min 12 sec

16 Threads: User time is 122 min 22 sec and system time is 21 min 27 sec and total time is 143 min 49 sec

Total time = User time + System Time

We used 1,2,4,8 and 16 threads to align reads.

Kindly help me out to find a valid reason.

Thanks in advance.

TopHat RNA-Seq Multithreading • 4.0k views

ADD COMMENT • link updated 6.9 years ago by novice ★ 1.1k • written 6.9 years ago by mbkmastkhel ▴ 10

3

Entering edit mode

After the actual alignment using Bowtie2 (which does scale well with multiple threads), Tophat spends a huge amount of time in a singlethreaded phase that may dominates the overall time, depending on the data. The alternative tools WouterDeCoster pointed out generally don't do this.

ADD REPLY • link 6.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Thank you for your reply. Would you kindly explain singlethreaded phase?

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

1

Entering edit mode

So you have a quad core processor, do I understand that correctly? If you use more threads than physical processors I can understand it's not that efficient.

I addition you should know that the old 'Tuxedo' pipeline of Tophat and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using kallisto or salmon.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you for your reply and time.

Yes sir i have a quad core system and it shows best performance on two threads and worse performance on increasing the threads i.e. is 4, 8 and 16 threads. I wants to find the exact problem why this version of TopHat shows best performance on two threads and worse performance on 4, 8 and 16 threads.

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

1

Entering edit mode

You only have 4 cores, and obviously, you also have background processes running. So at best 3 cores are fully free. Maybe you could add the time of 3 cores also to your analysis.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

This time is the CPU time: The time used in executing this process only. Other processes time are not counted in this time. I have 4 cores its also expected that it will also show best performance on 4 threads, while it shows worse performance than 2 threads.

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

1

Entering edit mode

I'm not an expert at server architecture/sysadmin stuff, so if I'm wrong I hope someone will correct me.

While you are running TopHat using 4 cores, your computer still has background processes running. Have a look in htop: many things are going on in the background. So therefore there aren't 4 cores free for TopHat.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Before tophat command i written a time command. this time command shows me Real time, User time and System time. user + system time is the pure CPU time. The time CPU spends on this process only. I already mentioned these time in the table above.

Actually it shows best performance on two threads instead of 4 threads. Normally its expected that it will show best performance on 4 threads because there are 4 cores. It shows best on 2 threads and worse on 4 threads. i runed 4 different datasets for all datasets it shows the same scenario, best performance on 2 threads and worse on all other threads i.e. 4 , 8 and 16 threads.

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

3

Entering edit mode

If you have 4 cores, that means you can utilize the CPU to 400%. If you have 4 threads, that means you can run 4 processes simultaneously (roughly) each at 100% CPU load. If you run 8+ threads on a 4 core / 4 thread processor, that means that your still running 4 threads at 100% at a time, but that your switching back and forth between two distinct threads on each cpu core. This adds overhead because it takes time and resources to switch between processes. You can't just throw some arbitrary number of threads at a job and think that means it gets done quicker. That's not how cpus work. Plus you have to consider memory and i/o bottlenecks as well. In particular 4 processes is not only going to use more CPU, but also more memory, cache, and I/O. If your system doesn't scale out well either for multi-threading or parallel processing, then you're beating a dead horse.

ADD REPLY • link 6.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Thank you so much

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

0

Entering edit mode

It means my results are correct and there is a problem of synchronization and communication between multiple threads. Due to less synchronization and communication time between two threads, it shows good performance on two threads. Synchronization and communication time time increases as we increasing threads, that's why alignment time increases on 4 ,8 and 16 threads. Am i correct?

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

0

Entering edit mode

Not entirely. You missed following points:

memory limitation
IO limitation
overhead due to physical limitation of number of cpus on your system

I get the feeling you are not even reading what we write.

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

1

Entering edit mode

Lol, you only have an finite number of threads, dude.

http://ark.intel.com/products/80815/Intel-Core-i5-4590-Processor-6M-Cache-up-to-3_70-GHz

ADD REPLY • link 6.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Thank you

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

1

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY • link 6.9 years ago by WouterDeCoster 47k

score 3 · Answer 1 · 2017-06-06

3

Entering edit mode

6.9 years ago

novice ★ 1.1k

Even if resources weren't a problem, you will general hit a point of diminishing returns on parallelism. This is when the time the software is putting into parallelizing computation (e.g. distributing tasks to threads, collecting tasks from threads, etc.) is longer than the time it would have taken to run the program with fewer threads, or serially.

This is very hand-wavy, but think of it like a group of people doing a project. If you divide the project into 10 tasks, you would save time by assigning the tasks to 10 people. But if you assign the 10 tasks to 100 people, the difficulty of communication and synchronization between those workers would cause a diminishing returns on efficiency. This is all not considering the part of the work that the manager would be doing after everyone finishes their task, or the serial part of the program. As Brian pointed out above, this is independent of the number of workers and represents a bottleneck on speed up.

Also, you can't compare runtime between different number of threads based on a couple of runs. For each condition, you must have at least 10 replicates (imo) because of random variation in runtime, especially with multithreaded software.

ADD COMMENT • link 6.9 years ago by novice ★ 1.1k

1

Entering edit mode

It means my results are correct and there is a problem of synchronization and communication between multiple threads. Due to less synchronization and communication time between two threads, it shows good performance on two threads. Synchronization and communication time time increases as we increasing threads, that's why alignment time increases on 4 ,8 and 16 threads.

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10

0

Entering edit mode

Try three threads :)

ADD REPLY • link 6.9 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Thank you so much

ADD REPLY • link 6.9 years ago by mbkmastkhel ▴ 10