makeblastdb and blast very slow for 1 Ensembl genome, not for others
2
0
Entering edit mode
6.3 years ago
memory_donk ▴ 330

Hi Biostars,

I'm running a program called BUSCO on several vertebrate genome assemblies (not so much for genome assessment as to collect single copy orthologs from their database). BUSCO mostly wraps several programs together (tblastn/augustus/hmmer). One genome in particular though, the opossum (Monodelphis domestica) has been giving me some trouble.

BUSCO first runs makeblastdb, which took nearly 24 hours for opossum (compared to about ~10 minutes for the dog and ~20 for the wallaby). BUSCO also runs threaded tblastn. For all of my genomes I've run it with 16-20 cores and the whole search lasts for no longer than 2 hours, but when I run it on the opossum genome, it only uses 4 cores even if I request 16 and 24 hours in its still going (though only adding something to the results file every few hours).

I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum. I've also checked the formatting and it looks identical to the other Ensembl genomes (headers use the same format as dog, lines organized in the same way, all are repeat masked and use "N" for masked data). Blast also gives me no error messages and doesn't exit, but it will hang for long stretches and will stop using resources on my computer.

I can't for the life of me figure out why this genome is behaving so differently when it is formatted correctly and being run the exact same way as my other genomes. Any ideas?

blast genome • 2.0k views
ADD COMMENT
0
Entering edit mode

Why don't you run the blast itself? The program might have bugs.

ADD REPLY
0
Entering edit mode

As I said in the point "I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum."

I don't think the current release of blast has a bug. It also wouldn't make sense that out of several identically formatted Ensembl release genomes that the problem would only arise for one.

ADD REPLY
0
Entering edit mode

Did you try ubuntu or other linux's blast repository?

ADD REPLY
0
Entering edit mode
4.9 years ago

Just to follow up on this rather old post (as I've had similar problems), it appears that tblastn may fail when running on multiple cores, so the recommendation is to use an older version, or just use one core.

Release notes BUSCO v2.0.1 March 2017 This minor-update release incorporates additional checks on the status of the tBLASTn step to make sure it has completed correctly. Instead of just producing a warning message, BUSCO will now report an error and the run will terminate. This update has been implemented because it appears that when running on multiple cores, tBLASTn from BLAST+ versions 2.4, 2.5, and 2.6 may sometimes fail to complete. Solutions (for now) are to either roll back to an earlier BLAST+ version, or to run using only a single core.

ADD COMMENT
0
Entering edit mode
4.8 years ago

I would jsut like to thank Graham for his very useful insight into BLAST failing with multiple cores. I had the exact same problem as memory_donk, in that the very same problem was happening with me. I've been running BUSCO 2 all week with no problem on various genome assemblies. Then today when setting up 3 new assemblies, with the exact same set up as before, using BLASTv.2.5 the runs were hanging at the blast db creation. It normally would take ~30-60 seconds to complete this stage, but after 1 hr it was still writing to the BLAST db files, and not crashing... simply running at a snails pace. Note I had specified 16 cores.

I switched over to BLASTv2.3, nothing else was changed and instantly the runs managed to get past the blast creation stage. All seems fine now. Clearly a problem with BLASTv2.5 and multi cpu calls.

Cheers !! Lahcen

ADD COMMENT

Login before adding your answer.

Traffic: 2446 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6