I'm running a program called BUSCO on several vertebrate genome assemblies (not so much for genome assessment as to collect single copy orthologs from their database). BUSCO mostly wraps several programs together (tblastn/augustus/hmmer). One genome in particular though, the opossum (Monodelphis domestica) has been giving me some trouble.
BUSCO first runs makeblastdb, which took nearly 24 hours for opossum (compared to about ~10 minutes for the dog and ~20 for the wallaby). BUSCO also runs threaded tblastn. For all of my genomes I've run it with 16-20 cores and the whole search lasts for no longer than 2 hours, but when I run it on the opossum genome, it only uses 4 cores even if I request 16 and 24 hours in its still going (though only adding something to the results file every few hours).
I've tried running makeblastdb and tblastn separately and observed the same problem, but only on the opossum. I've also checked the formatting and it looks identical to the other Ensembl genomes (headers use the same format as dog, lines organized in the same way, all are repeat masked and use "N" for masked data). Blast also gives me no error messages and doesn't exit, but it will hang for long stretches and will stop using resources on my computer.
I can't for the life of me figure out why this genome is behaving so differently when it is formatted correctly and being run the exact same way as my other genomes. Any ideas?
Just to follow up on this rather old post (as I've had similar problems), it appears that tblastn may fail when running on multiple cores, so the recommendation is to use an older version, or just use one core.
Release notes BUSCO v2.0.1 March 2017 This minor-update release incorporates additional checks on the status of the tBLASTn step to make sure it has completed correctly. Instead of just producing a warning message, BUSCO will now report an error and the run will terminate. This update has been implemented because it appears that when running on multiple cores, tBLASTn from BLAST+ versions 2.4, 2.5, and 2.6 may sometimes fail to complete. Solutions (for now) are to either roll back to an earlier BLAST+ version, or to run using only a single core.
I would jsut like to thank Graham for his very useful insight into BLAST failing with multiple cores. I had the exact same problem as memory_donk, in that the very same problem was happening with me. I've been running BUSCO 2 all week with no problem on various genome assemblies. Then today when setting up 3 new assemblies, with the exact same set up as before, using BLASTv.2.5 the runs were hanging at the blast db creation. It normally would take ~30-60 seconds to complete this stage, but after 1 hr it was still writing to the BLAST db files, and not crashing... simply running at a snails pace. Note I had specified 16 cores.
I switched over to BLASTv2.3, nothing else was changed and instantly the runs managed to get past the blast creation stage. All seems fine now. Clearly a problem with BLASTv2.5 and multi cpu calls.
Cheers !! Lahcen