Question: makeblastdb and blast very slow for 1 Ensembl genome, not for others
0
gravatar for memory_donk
3.4 years ago by
memory_donk230
Australia
memory_donk230 wrote:

Hi Biostars,

I'm running a program called BUSCO on several vertebrate genome assemblies (not so much for genome assessment as to collect single copy orthologs from their database). BUSCO mostly wraps several programs together (tblastn/augustus/hmmer). One genome in particular though, the opossum (Monodelphis domestica) has been giving me some trouble.

BUSCO first runs makeblastdb, which took nearly 24 hours for opossum (compared to about ~10 minutes for the dog and ~20 for the wallaby). BUSCO also runs threaded tblastn. For all of my genomes I've run it with 16-20 cores and the whole search lasts for no longer than 2 hours, but when I run it on the opossum genome, it only uses 4 cores even if I request 16 and 24 hours in its still going (though only adding something to the results file every few hours).

I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum. I've also checked the formatting and it looks identical to the other Ensembl genomes (headers use the same format as dog, lines organized in the same way, all are repeat masked and use "N" for masked data). Blast also gives me no error messages and doesn't exit, but it will hang for long stretches and will stop using resources on my computer.

I can't for the life of me figure out why this genome is behaving so differently when it is formatted correctly and being run the exact same way as my other genomes. Any ideas?

blast genome • 1.2k views
ADD COMMENTlink modified 23 months ago by lahcencampbell0 • written 3.4 years ago by memory_donk230

Why don't you run the blast itself? The program might have bugs.

ADD REPLYlink written 3.4 years ago by Pappu1.9k

As I said in the point "I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum."

I don't think the current release of blast has a bug. It also wouldn't make sense that out of several identically formatted Ensembl release genomes that the problem would only arise for one.

 

 

ADD REPLYlink written 3.4 years ago by memory_donk230

Did you try ubuntu or other linux's blast repository?

ADD REPLYlink written 3.4 years ago by Pappu1.9k
0
gravatar for graham.etherington
2.0 years ago by
graham.etherington0 wrote:

Just to follow up on this rather old post (as I've had similar problems), it appears that tblastn may fail when running on multiple cores, so the recommendation is to use an older version, or just use one core.

Release notes BUSCO v2.0.1 March 2017 This minor-update release incorporates additional checks on the status of the tBLASTn step to make sure it has completed correctly. Instead of just producing a warning message, BUSCO will now report an error and the run will terminate. This update has been implemented because it appears that when running on multiple cores, tBLASTn from BLAST+ versions 2.4, 2.5, and 2.6 may sometimes fail to complete. Solutions (for now) are to either roll back to an earlier BLAST+ version, or to run using only a single core.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by graham.etherington0
0
gravatar for lahcencampbell
23 months ago by
Hinxton, UK.
lahcencampbell0 wrote:

I would jsut like to thank Graham for his very useful insight into BLAST failing with multiple cores. I had the exact same problem as memory_donk, in that the very same problem was happening with me. I've been running BUSCO 2 all week with no problem on various genome assemblies. Then today when setting up 3 new assemblies, with the exact same set up as before, using BLASTv.2.5 the runs were hanging at the blast db creation. It normally would take ~30-60 seconds to complete this stage, but after 1 hr it was still writing to the BLAST db files, and not crashing... simply running at a snails pace. Note I had specified 16 cores.

I switched over to BLASTv2.3, nothing else was changed and instantly the runs managed to get past the blast creation stage. All seems fine now. Clearly a problem with BLASTv2.5 and multi cpu calls.

Cheers !! Lahcen

ADD COMMENTlink modified 23 months ago • written 23 months ago by lahcencampbell0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1765 users visited in the last hour