standalone blastp: increasing word size extremely slows down the search
1
0
Entering edit mode
4.4 years ago

Hello,

I need to blastp a genome (15,000 seqs) against genome (12,000 seqs) using Biopython. I decided to use local blast and query genome 1 fasta file against genome 2 database ( made by makeblastdb command with second genome fast file ). I also managed to perform the blast search for default parameters of standalone blastp. However, when I try to change word size to BIGGER value ( default is 3 and i set it to 6, the blast performs extremely slow. I am kind of confused why such a thing happens because increasing word size is supposed to make things go faster. Here is how i pass arguments to NcbiblastpCommandline function:

NcbiblastpCommandline( word_size=6, query=queryInputPath, db=subjectInputPath, out=outputPath, outfmt=5 )()

things are much faster when the function does not have 'word_size=6' keyword argument. Without word size = 6 it takes around an 1,5 h to perform blast. My mac has 4gb of RAM and 1,6 GHz Intel Core i5 processor. What may be the cause?

genome blast • 2.0k views
ADD COMMENT
2
Entering edit mode

Check that you're not running out of memory.

ADD REPLY
2
Entering edit mode

With 4GB of RAM very likely.

ADD REPLY
0
Entering edit mode

You may be able to save some overhead if you run BLAST directly from the command line, although not likely a meaningful amount. You may also try splitting the database up into multiple parts, just make sure you manually set the statistical options (e.g. dbsize). You'll have to do some post blast work to find the best hits, but this should get you around the memory issues.

ADD REPLY
0
Entering edit mode

Hi Aleksander, Long shot but did you ever figure out why increasing the word size slows down the search? I have the same problem with blastp version 2.11.0 and it does not look like I'm reaching any memory limit. Cheers, Henrietta

ADD REPLY
0
Entering edit mode
4.4 years ago

I would recommend using blast replacements like DIAMOND or PAUDA

https://ab.inf.uni-tuebingen.de/software/diamond

https://ab.inf.uni-tuebingen.de/software/pauda

ADD COMMENT
0
Entering edit mode

Just to be clear .. there is no point in trying to use these tools on the machine described in the original post.

ADD REPLY

Login before adding your answer.

Traffic: 1863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6