Question

makeblastdb.exe giving me an unclear error message when creating a database

1

Entering edit mode

7.7 years ago

DNAngel ▴ 250

I have a large sequence file that I want to convert into a database where I can blast other sequences against it. I've done this many times before with smaller file sizes, however this one is giving me an unclear error message:

Building a new DB, current time: 10/23/2017 14:17:46
New DB name:   ~\blast\db\mydatabase
New DB title:  ~\blast\myseqs.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: ~\blast\db\mydatabase

file: ~\blast\db\mydatabase.nin
file: ~\blast\db\mydatabase.nhr
file: ~\blast\db\mydatabase.nsq

BLAST Database creation error: Need to write conversion for data type [0].

Note: I do not have missing residues (no empty lines), my sequences do have gaps with "-" representing gaps. I thought maybe that was the problem, but when I take say the first 10 sequences (keeping the gaps) from the same file, it converts easily into a database. So I thought maybe it was the size of the file (it is about 47400kb) so I broke the file up into 3 smaller files. Only the second file out of the three converted successfully into a nucleotide database, but the other 2 did not (note: they were all the same size and nothing was different about the sequences).

Here is the very simple command I used and have always used before with no issue:

makeblastdb.exe -in myseqs.fa -dbtype nucl -out mydatabase

I've contacted the support group for standalone blast on NCBI, but they have not responded at all to me, nor could I find any other instances of that error message on Google. I'm stumped.

blast • 2.7k views

ADD COMMENT • link updated 7.7 years ago by Pierre Lindenbaum 166k • written 7.7 years ago by DNAngel ▴ 250

0

Entering edit mode

You are using a single - to represent gaps of any length, correct?

ADD REPLY • link 7.7 years ago by GenoMax 152k

0

Entering edit mode

each '-' represents 1 gap in the sequence, so one hypen = one base.

ADD REPLY • link 7.7 years ago by DNAngel ▴ 250

score 0 · Answer 1 · 2017-10-23

0

Entering edit mode

7.7 years ago

Pierre Lindenbaum 166k

this message seems to be generated when your DNA is not:

and then: boooom https://github.com/LeeBergstrand/NCBI_Tools/blob/281d543937237a80364105b4d14b3671136e06aa/src/objtools/blast/seqdb_writer/writedb_impl.cpp#L898

check your dna sequence, search for strange characters in the fasta. E.g:

 grep -v '^>' input.fa | grep -o . | sort | uniq -c

ADD COMMENT • link 7.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I tried your suggestion for checking weird characters but I keep getting another error. Perhaps this is where the issue is? Although I don't understand the error (I am not great with grep/linux commands).

It says:

Input record exceeds maximum length. Specify larger maximum.

grep: write error: Illegal seek grep: write error: Invalid or incomplete multibyte or wide character

ADD REPLY • link 7.7 years ago by DNAngel ▴ 250

0

Entering edit mode

what is the output of

file input.fa

must be something like 'ASCII text'

ADD REPLY • link 7.7 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

It says: input.fa: ASCII text, with very long lines

ADD REPLY • link 7.7 years ago by DNAngel ▴ 250

0

Entering edit mode

This sounds like an issue related to sort on windows. Do you have access to a unix machine? Otherwise you could try wrapping the long fasta lines.

ADD REPLY • link 7.7 years ago by GenoMax 152k