The process ran fine for a while, but eventually I got the error, "No space left on device." Since I still had hundreds of GB of remaining empty space on the C drive, I assume this is an issue with the RAM.
I don't think you should assume that. The error message is clear. DIAMOD could have tried to write a file, failed, and then deleted the partial file.
It appears that you are still going on with your attempts to BLAST your large database against the NR. I don't mean to be pushy, but it won't work with the resources you have. That's even with DIAMOND being slightly faster than BLAST. You will just end up opening one thread after another with various problems you are likely to encounter along the way.
If you absolutely feel like you want to do this, you were already given a suggestion to use cluster_nr
database. That would be nr
clustered at 90% identity, which for practical purposes has the same functionality as nr
, but with 60-70% of its size.
I feel like giving it one more try: with resources you have (both memory and disk space, and the fact you are doing this on a Windows computer), this will not work in a reasonable amount of time if you are still trying to search 23,000 proteins against nr
, or really any database similar in size to nr
. I implore you to read through other suggestions that were given to you as that will save you a lot of time while also resulting in a better outcome. This is all assuming your goal is to annotate a genome, which as of right now you are yet to confirm.
Did you get the
nr.gz
that is fasta format.Did you see this note in the README at NCBI FTP site. The file you downloaded is from APRIL 2024. For your use case it may not matter but ...
As for this
No because even if you get a pre-formatted database from somewhere the actual alignments will again require more than 32GB of RAM.