Question: Speeding Up The Blast Job
0
gravatar for User000
6.3 years ago by
User000380
User000380 wrote:

Hello, I am trying to BLASTX my contig database against protein database of UniProt/TrEMBL. It is taking me ~20 min for every contig and I have thousands of them, so in total it is going to take me 2 months! Do you know if there is a way to speed up the blastx job? Note: I am already using clusters and I already split up my contigs into smaller files. This is the command line I use:

blastall -p blastx -e 0.001 -m 8 -S 1 -i input.fasta -d trembl.fasta -o output
• 2.7k views
ADD COMMENTlink modified 6.3 years ago by Michael Dondrup47k • written 6.3 years ago by User000380
2

looks like you are using legacy blast, did you try blast+? What is your computer setup, how many nodes, CPUs etc? How did you split your contigs? Are all cores already running at 100% load all the time? If not, then there is a tutorial on using GNU parallel with blast on this site: GNU Parallel - parallelize serial command line programs without changing them

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Michael Dondrup47k

I havent tried blats+ yet, since I am facing difficulties to download anything on this computer, so I am basically using what they have already downloaded. I split my 200000 transcripts in 7 files using a python script, I am running a blast in remote mode in background sending them to computer clusters they have at uni..to be able to finish in 6 days I need to split my files in 250 parts..

ADD REPLYlink written 6.3 years ago by User000380
1

Some extra info might help. What version of blast are you using? What is the database you're blasting against (nr? is online/offline)? What is the length of your contigs? etc etc

ADD REPLYlink written 6.3 years ago by Biojl1.7k

I am using blastall, blasting my plant transcripts (~200000 ns) against TrEMBL (~45 mln of protein seq-s). My contig length vary from ~300 min and ~22000 max, average ~1500 bp.

ADD REPLYlink written 6.3 years ago by User000380

blastall is the old version? Have you tried any blast+ version (i.e. > 2.2.28+). Is significantly faster and allows for multicore usage.

ADD REPLYlink written 6.3 years ago by Biojl1.7k

I am using a debian at University, and it is impossible to download anything there 1)it is so old, needs update 2) I have no access to root. Any other suggestions? of cos If there are no other possible solutions I am gonna do my best to follow the solution you suggested..

ADD REPLYlink written 6.3 years ago by User000380

Does anyone know how to check the progress, as in how many sequences are done blasting and how many remaining? I output in format 5 xml.

ADD REPLYlink written 6.3 years ago by Adrian Pelin2.4k

afaik you can't get an exact progress report, because the output is buffered, even worse so with xml output because until the job is finished, it is not well-formed xml. try standard output format instead, I sometimes check progress using grep -ce "Query=" blastout because this occurs exactly once at the start of any result section.

ADD REPLYlink written 6.3 years ago by Michael Dondrup47k

I have a big fasta file with ORFs that I want to blast. If I have 32 cores on my workstation, does it make more sense to split the fasta file and initiate separate blast tasks?

In other words, is it faster to split the fasta into 4 fasta files and assign 8 cores per file using blastp, rather than just gives the entire initial non split fasta files the entire 32 cores?

ADD REPLYlink written 6.3 years ago by Adrian Pelin2.4k

I cannot exactly explain why, but from my experience the last option is faster, you could run a little benchmark if you want to find out exactly.

ADD REPLYlink written 6.3 years ago by Michael Dondrup47k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1422 users visited in the last hour