Question: How Could I Run A Huge Number Of Blast Calls Faster?
3
gravatar for aliahmadvand2004
4.2 years ago by
aliahmadvand200450 wrote:

For the purpose of my project I need to break down a genome and run blast for each part. Because of the amount of genome file it would be at least 4000 call of blast which would take a lot of time. I'm using NCBIQBlastService to do my alignment remotely and as I checked for each request it would take 20 sec so for the whole 4000 it would take around a day. Is there any other way to do this faster. any suggestion would be really appriciated. and BTW this might help too http://biojava.org/wiki/BioJava:CookBook3:NCBIQBlastService

genome ncbi blast • 3.4k views
ADD COMMENTlink modified 2.8 years ago by qiyunzhu120 • written 4.2 years ago by aliahmadvand200450
4
gravatar for Josh Herr
4.2 years ago by
Josh Herr5.4k
University of Nebraska
Josh Herr5.4k wrote:

There are multiple ways to speed up a BLAST analysis. For a start, if you run your BLAST locally it will be faster than sending all the data back and forth between NCBI. Can you run BLAT instead?

ADD COMMENTlink written 4.2 years ago by Josh Herr5.4k
3

I guess if he runs BLAT, he will miss lots of homologous sequences he might be interested in as Blast is more sensitive than BLAT because blast uses a smaller window size of 3 when it looks for homologous seauences whereas BLAT uses a longer Window size. I usually don't prefer BLAT instead of Blast unless I look for highly similarities or do mapping. Even BLAT will take quite long time unless you run a parallel BLAT means you need to divide your sequences in many segments and run the BLAT and finally put the output back together.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by bioinfo620
1

+1, and I absolutely agree about BLAT. For a lot of what I do, BLAT can suffice and saves a little bit of time. When I have to identify millions of environmental sequences against an extremely large databases, you're not exactly going to get high levels of confidence anyway.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Josh Herr5.4k
2
gravatar for Alex Reynolds
4.2 years ago by
Alex Reynolds19k
Seattle, WA USA
Alex Reynolds19k wrote:

Perhaps you could send your searches in 25-jobs-at-one-time batches to EMBL's NCBI BLAST REST-based service. At a 25:1 ratio, a set of jobs that take a day would take a little less than an hour (all other things being equal).

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Alex Reynolds19k
2

EMBL-EBI provide a range of sequence similarity search services, for there SOAP and REST web service interfaces are available (see https://www.ebi.ac.uk/Tools/webservices/#sequence_similarity_search_sss) as well as the web interfaces (http://www.ebi.ac.uk/Tools/sss/). Sample clients are provided, and some suggestions on implementing batch analysis is provided (https://www.ebi.ac.uk/Tools/webservices/help/faq#how_can_i_analyse_multiple_sequences)..

NCBI's BLAST web services (see http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=DeveloperInfo) have different usage restrictions, One of these being limiting the frequency of requests. Given the average runtime of your jobs as 20s that a request per 3s suggests about 6 jobs could be run in parallel. If the query sequences can be batched, so each job performs 10 searches, than the average job time would increase to about 200s and since the request frequency is what is being limited that translates into significant parallelism.

All that said the databases available at EMBL-EBI are not the same as those available from NCBI, so the database choice may force the use of one particular service.

ADD REPLYlink written 4.2 years ago by Hamish2.9k
1
gravatar for Eric Normandeau
4.2 years ago by
Eric Normandeau9.4k
Quebec, Canada
Eric Normandeau9.4k wrote:

As suggested by Josh Herr, you can use blast on your computer to perform large numbers of blasts faster.

The easiest, from my perspective, way to do that would be to have Linux (or MacOSX) installed on a computer, install blast and desired databases and launch the blasts.

If you have no experience with UNIX-like systems, then you would probably need help from a person that is knowledgeable about this.

If you tell us a bit more about your experience, the computer you use or could use in the lab (installed systems, number of CPUs), we may be able to help you some more.

ADD COMMENTlink written 4.2 years ago by Eric Normandeau9.4k
0
gravatar for qiyunzhu
2.8 years ago by
qiyunzhu120
United States
qiyunzhu120 wrote:

This thread has been there for a long time, but I would like to add a new tip for those who run ncbi-blast+ in their on computers: that if you place the database in a fast storage device (e.g., SSD), you will get a *dramatic* gain in speed!

ADD COMMENTlink written 2.8 years ago by qiyunzhu120

Did you benchmark this? I would have guessed the OS would cache the most frequently used pages in memory anyway.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Jeremy Leipzig17k

I didn't do a serious benchmark, but estimated a 3-10 fold increase in speed. I also think memory will make a key contribution, if it is large enough (I guess 128GB is necessay for the whole nr), and if I can throw the whole database into memory somehow.

ADD REPLYlink written 2.6 years ago by qiyunzhu120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 668 users visited in the last hour