Trying to blast locally against nt but process hangs indefinitely
1
0
Entering edit mode
9 weeks ago

Hello !

I'm trying to blast one sequence against nt that I downloaded locally but the process hangs forever and nothing is written in the output file.

For reference, I've downloaded nt from the ncbi ftp, checked all the md5 afterwards and once everything was unzipped, I used blastdbcheck to make sure it was not corrupted.

My issue is that when trying to run the following command, it just seems to do nothing at all for a long time (I always end up killing the process after 30/40min). When I replace the nt by nt.00 for instance I do get results very quickly.

blastn -num_threads 16 -query oneseq.fasta -db nt -outfmt 7 -out test.tsv


For reference, my computer has 32Gb of ram available, but never seems to use more than 5 (rest is cache). I cannot find any "verbose" mode for blast to see if he's trying to do something or if it's really stuck and hanging somehow.

If I understood blast behavior properly, when the database is split like nt is (nt.00, nt.01 ... nt.77), blast tries to search against everything and then compile the results together. Can it be that I just don't have enough memory for assigning even a single sequence in reasonable time ? I would expect to assign at least one sequence in a few minutes.

blast • 308 views
0
Entering edit mode

Okay, I figured it out. I downloaded the nt database again and this time it went fine. I guess there was something wrong with the database I had. I am now a bit surprised by blastdbcheck going through just fine.

2
Entering edit mode
9 weeks ago

What does "indefinitely" mean? Try leaving it on fore longer...

You're right that BLAST unfortunately isn't very verbose.

• To test everything is fine, you could reduce the length and complexity of your query. And perhaps increase the stringency (-evalue 1.0e-20 - but see other posts too).
• You could also duplicate the "nt" alias (the .nal file) - its a text file, that tells BLAST to look at all the others. If you e.g., reduce the number of actual dbs listed there, you could see if one of them is contributing to breaking things.

However, I suspect its simply that nt is a huge database that takes a long time to run through. RAM, input/output speed and CPU speed all are important players in making BLAST run fast...

1
Entering edit mode

Thanks a lot for those, I managed to have it run in 10min for a single sequence on after re-downloading nt.

0
Entering edit mode

Ah that's great.

I didn't think that could be an issue because you said you checked all the md5s... Anyhow glad to hear it works! (we've actually in some cases seen specific versions of BLAST be unhappy (segfault) with particular downloads of nr/nt but only for specific queries!!)

0
Entering edit mode

Yep, the md5 were for the .tar.gz files, I believe some corruption could've happened during the untar