I used TopHat (v. 2.1.1) to align RNA-Seq reads of zebrafish to a genome (Danio_rerio.GRCz10.pep.abinitio.fa) on a system with the following specifications:
Memory: 4GB RAM
Processor: Intel® Core™ i5-4590 CPU @ 3.30GHz × 4
OS: Ubuntu 16.04 LTS (64 bits)
Hardware: 64 bits Architecture
time tophat --solexa-quals -g 2 -p 1 --no-coverage-search -j annotation/Danio_rerio.Zv9.66.spliceSites -o tophatoutput/ZV9 genome/zebrafish data/SRR630464_1.fastq data/SRR630464_2.fastq
TopHat Command Parameters
TopHat command parameters that we used are listed below:
-g Maximum number of multi hits allowed. Short reads are likely to map to more than one location in the genome even though these reads can have originated from only one of these regions. In RNA-Seq we allow for a restricted number of multi hits, and in this case we ask Tophat to report only reads that map at most onto 2 different loci.
-p Use these many threads to align reads
--library-type Before performing any type of RNA-Seq analysis you need to know a few things about the library preparation. Was it done using a strand-specific protocol or not? If yes, which strand? In our data the protocol was NOT strand specific.
--no-coverage-search To reduce the time it takes and to reduce the memory.
-J Improve spliced alignment by providing Tophat with annotated splice junctions. Pre-existing genome annotation is an advantage when analyzing RNA-Seq data. This file contains the coordinates of annotated splice junctions from Ensemble. These are stored under the sub-directory annotation in a file called
-O This specifies in which subdirectory Tophat should save the output files.
My question is by increasing number of threads means using multi-threading the time it takes to align the reads to genome will be reduced. while here as shown in the table given below by increasing number of threads, due to increasing number of threads the alignment time also increases instead of decreasing. only on two threads the alignment time decreases while on 4,8 and 16 threads the alignment time increases why?
Files: FASTQ Files of Zebrafish accession number GSE42846. SRR630464_1.fastq/ SRR630464_2.fastq
Size: 4.2 GBs each file
Nucleotide Sequences: 24410561 Sequences have each file
1 Thread : User time is 137 min 0 sec and system time is 5 min 12 sec* and **total time is 142 min 12 sec
2 Threads: User time is 120 min 33 sec and system time is 6 min 40 sec and total time is 127 min 13 sec
4 Threads: User time is 124 min 20 sec and system time is 19 min 54 sec and total time is 144 min 14 sec
8 Threads: User time is 122 min 41 sec and system time is 18 min 31 sec and total time is 141 min 12 sec
16 Threads: User time is 122 min 22 sec and system time is 21 min 27 sec and total time is 143 min 49 sec
Total time = User time + System Time
We used 1,2,4,8 and 16 threads to align reads.
Kindly help me out to find a valid reason.
Thanks in advance.