Question: makeblastdb.exe giving me an unclear error message when creating a database
1
gravatar for DNAngel
2.8 years ago by
DNAngel60
DNAngel60 wrote:

I have a large sequence file that I want to convert into a database where I can blast other sequences against it. I've done this many times before with smaller file sizes, however this one is giving me an unclear error message:

Building a new DB, current time: 10/23/2017 14:17:46
New DB name:   ~\blast\db\mydatabase
New DB title:  ~\blast\myseqs.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: ~\blast\db\mydatabase

file: ~\blast\db\mydatabase.nin
file: ~\blast\db\mydatabase.nhr
file: ~\blast\db\mydatabase.nsq

BLAST Database creation error: Need to write conversion for data type [0].

Note: I do not have missing residues (no empty lines), my sequences do have gaps with "-" representing gaps. I thought maybe that was the problem, but when I take say the first 10 sequences (keeping the gaps) from the same file, it converts easily into a database. So I thought maybe it was the size of the file (it is about 47400kb) so I broke the file up into 3 smaller files. Only the second file out of the three converted successfully into a nucleotide database, but the other 2 did not (note: they were all the same size and nothing was different about the sequences).

Here is the very simple command I used and have always used before with no issue:

makeblastdb.exe -in myseqs.fa -dbtype nucl -out mydatabase

I've contacted the support group for standalone blast on NCBI, but they have not responded at all to me, nor could I find any other instances of that error message on Google. I'm stumped.

blast • 1.1k views
ADD COMMENTlink modified 2.8 years ago by Pierre Lindenbaum129k • written 2.8 years ago by DNAngel60

You are using a single - to represent gaps of any length, correct?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax87k

each '-' represents 1 gap in the sequence, so one hypen = one base.

ADD REPLYlink written 2.8 years ago by DNAngel60
0
gravatar for Pierre Lindenbaum
2.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

this message seems to be generated when your DNA is not:

and then: boooom https://github.com/LeeBergstrand/NCBI_Tools/blob/281d543937237a80364105b4d14b3671136e06aa/src/objtools/blast/seqdb_writer/writedb_impl.cpp#L898

check your dna sequence, search for strange characters in the fasta. E.g:

 grep -v '^>' input.fa | grep -o . | sort | uniq -c
ADD COMMENTlink written 2.8 years ago by Pierre Lindenbaum129k

I tried your suggestion for checking weird characters but I keep getting another error. Perhaps this is where the issue is? Although I don't understand the error (I am not great with grep/linux commands).

It says:

Input record exceeds maximum length. Specify larger maximum.

grep: write error: Illegal seek grep: write error: Invalid or incomplete multibyte or wide character

ADD REPLYlink written 2.8 years ago by DNAngel60

what is the output of

file input.fa

must be something like 'ASCII text'

ADD REPLYlink written 2.8 years ago by Pierre Lindenbaum129k

It says: input.fa: ASCII text, with very long lines

ADD REPLYlink written 2.8 years ago by DNAngel60

This sounds like an issue related to sort on windows. Do you have access to a unix machine? Otherwise you could try wrapping the long fasta lines.

ADD REPLYlink written 2.8 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour