Question: makeblastdb Fasta file with 25 sequences gives Error: mdb_env_open: There is not enough space on the disk
0
gravatar for minicola
9 months ago by
minicola0
minicola0 wrote:

Hi all

I am new to Biostars and blast. I am trying to convert a (test)-database of 25 sequences into a blast-database using standard tutorial commands. However, I get the following error:

Error: mdb_env_open: There is not enough space on the disk.

In my cd I get a file of >200gb which seems ridiculously high for only 25 sequences. Does anyone have an idea what I am doing wrong?

Kind regards

Michaël

blast makeblastdb • 4.9k views
ADD COMMENTlink modified 8 months ago by a.eivazi1820 • written 9 months ago by minicola0
2

I had the same problem with blast+ 2.10.0 and downloaded version 2.2.30 and it worked without any problem.

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.30/

ADD REPLYlink modified 8 months ago • written 8 months ago by a.eivazi1820

That version is nearly 6! years old, I would not advise to use it anymore unless in very specific cases (of which this one is not)

ADD REPLYlink written 8 months ago by lieven.sterck8.5k

did blast algorithm change in any substantial way since 2015? I think it was already very, very established by then - so there should be no difference whatsoever in the main components.

ADD REPLYlink written 5 months ago by predeus1.4k

Yes it kinda did.

fro example from 2.8 onwards it uses a complete new database schema (though still backwards compatible) . They it handles the alignment statistics and such has also been changed, as well as numerous other bug fixes and improvements. (eg. a hit e-value from the 2.2.30 will not be the same anymore in the 2.10 version )

you're of course free to still use the older version but than don't expect to be state-of-the art.

ADD REPLYlink written 5 months ago by lieven.sterck8.5k

That's interesting info, thank you. What is the difference in e-value calculation - can you point me to where it's described? Would hit ranking still be preserved?

ADD REPLYlink written 5 months ago by predeus1.4k

hmm, I am not so much into those details but looking through the change log of the blast releases might teach you something (I don't think there has been a manuscript on these) or here : https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=References

ADD REPLYlink written 5 months ago by lieven.sterck8.5k

200gb is too large but you have to post the full command and maybe the database to get useful responses.

ADD REPLYlink written 9 months ago by Michael Dondrup47k

that would be

makeblastdb -in genes_blast.fsa -dbtype nucl -out test

ADD REPLYlink written 9 months ago by minicola0

Ok, now we need the size of genes_blast.fsa, please post the output of:

ls -lh genes_blast.fsa
wc genes_blast.fsa
head genes_blast.fsa

Edit: I think you are using the wrong file, .fsa might be output of makeblastdb already, the real input file might be .fna, .fasta

Just delete all output and make sure you only have a FASTA file in the working directory that looks like:

 >header
 ACGT
 >seq
 ACCCT

Then run the command again on that file.

ADD REPLYlink modified 9 months ago • written 9 months ago by Michael Dondrup47k

Hi Changing to .fasta does not solve the problem.

  ls -lh genes_blast.fasta: 63k
  wc genes_blast.fasta:1044 1268 63783
  head genes_blast.fsa: 
 >NM_131667.1 Danio rerio GTP cyclohydrolase 2 (gch2), mRNA
GAGTCAGCTCCACGACGATCAACAGGCTACCCAAGCACCGGCTGCAGTTCTGAAGCAACA
TCTGCTCGACTTCCAATATAAATAACAGGCTTGAAATTATTATTATCTTCTAAATAGTCG
ATCATTAGTCAGTATGGAATACCAAAAGGCAGCAGAACTGAACAGTTTGTGCAATGGCAA
AATCGTCACAGAGTATCTCTGCCGCAATGGCTTTAGCGACCTGACGGTCGACACGAAAAA
AGTCGCTGTCCAGCACAAAAACGAGACATCCCGGAAAGAGGAGGAGGATGAGTCGCGGTT
ACCTGCTCTGGAGGCGGCATACACCACTATACTGCGTGGACTGGGGGAAAACACCGACCG
ACAGGGTCTCCTCAAAACCCCTCTCCGTGCTGCCAAAGCCATGCAGTTTCTGACTAAGGG
ATACCACGAGACCATCTACGATATCCTTAACGATGCCATATTTGATGAAGACCATGAAGA
GCTAGTCATTGTGAAAGACATTGACATGTTTTCACTTTGTGAACATCATCTAGTACCATT
ADD REPLYlink modified 9 months ago • written 9 months ago by minicola0

Sorry, then the problem might be irreproducible because that looks like a small fasta file (does it have > at the beginning of each fasta header?) So that means something else might be wrong. If you want you can upload the input file somewhere (github, pastebin, etc. ) and we can have a try. Otherwise you need to contact NCBI supprt.

ADD REPLYlink written 9 months ago by Michael Dondrup47k

I tried with just one sequence and even then I get the problem. I tried running in windows cmd and using R but always have the same problem.

ADD REPLYlink written 9 months ago by minicola0

Try reinstalling the latest version of the blast binaries for windows. Makeblastdb should be very stable and is used by many thousands regularly. So most likely the error is on your side, possibly in your (windows?) setup. I am sorry, but don't think we can solve this problem here.

ADD REPLYlink written 9 months ago by Michael Dondrup47k

minicola : Your fasta files do not appear to be in correct format.

It looks like they are missing a > at beginning of the fasta header. Is that correct?

ADD REPLYlink written 9 months ago by genomax89k

Sorry, that is a copy-paste error the ">" dissappeared when copying into text-field.

ADD REPLYlink modified 9 months ago • written 9 months ago by minicola0

Hello, im getting the very same error. Tested on two computers with different systems. Using the most recent version of BLAST from NCBI download page. Seems like a bug on their side. C.

ADD REPLYlink written 9 months ago by Caya70

Are you using windows OS? I don't see this problem on linux with latest blast+ v.2.10.0.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax89k

Yes, windows on both PCs. Different versions though.

ADD REPLYlink written 9 months ago by Caya70

If you are sure this is a bug then please report it to NCBI help desk. Beware that it may take them 2-3 business days to respond. It is also end-of-year now which may increase that time.

ADD REPLYlink written 9 months ago by genomax89k

This seems to be a Win 10 - specific bug. I still have Windows 8 and latest blast works fine on my computer, but my wife with Windows 10 couldn't get it to work. Old version did the trick though.

ADD REPLYlink written 5 months ago by predeus1.4k
7
gravatar for Caya
8 months ago by
Caya70
Prague
Caya70 wrote:

Response from NLM support solved the issues. Here it is.

Thank you for the report. This is a known issue with the Windows release. The program makeblastdb attempts to allocate a very large amount of virtual memory. You can solve the problem by setting the a new BLAST environment variable BLASTDB_LMDB_MAP_SIZE=1000000 See the BLAST setup documentation for details on how to set Windows environment variables (https://www.ncbi.nlm.nih.gov/books/NBK52637/). Once you change the variable, you'll need to close and reopen the command window where you were running BLAST for the new setting to take effect.

ADD COMMENTlink written 8 months ago by Caya70
1

Thanks for adding this - it solved the problem for me.

It took me a few goes to work out how to format the new environmental variable (haven't done anything like this before). I created a new 'User Variable'. The variable name was "BLASTDB_LMDB_MAP_SIZE" and the Value was "1000000". I was initially adding all the text you included (BLASTDB_LMDB_MAP_SIZE=1000000), which threw an error.

ADD REPLYlink written 8 months ago by __mark-10

Thanks for coming back and providing closure to this thread.

ADD REPLYlink written 8 months ago by genomax89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1194 users visited in the last hour