Question: makeblastdb.exe giving me an unclear error message when creating a database
1
gravatar for DNAngel
22 months ago by
DNAngel30
DNAngel30 wrote:

I have a large sequence file that I want to convert into a database where I can blast other sequences against it. I've done this many times before with smaller file sizes, however this one is giving me an unclear error message:

Building a new DB, current time: 10/23/2017 14:17:46
New DB name:   ~\blast\db\mydatabase
New DB title:  ~\blast\myseqs.fa
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: ~\blast\db\mydatabase

file: ~\blast\db\mydatabase.nin
file: ~\blast\db\mydatabase.nhr
file: ~\blast\db\mydatabase.nsq

BLAST Database creation error: Need to write conversion for data type [0].

Note: I do not have missing residues (no empty lines), my sequences do have gaps with "-" representing gaps. I thought maybe that was the problem, but when I take say the first 10 sequences (keeping the gaps) from the same file, it converts easily into a database. So I thought maybe it was the size of the file (it is about 47400kb) so I broke the file up into 3 smaller files. Only the second file out of the three converted successfully into a nucleotide database, but the other 2 did not (note: they were all the same size and nothing was different about the sequences).

Here is the very simple command I used and have always used before with no issue:

makeblastdb.exe -in myseqs.fa -dbtype nucl -out mydatabase

I've contacted the support group for standalone blast on NCBI, but they have not responded at all to me, nor could I find any other instances of that error message on Google. I'm stumped.

blast • 795 views
ADD COMMENTlink modified 22 months ago by Pierre Lindenbaum122k • written 22 months ago by DNAngel30

You are using a single - to represent gaps of any length, correct?

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax70k

each '-' represents 1 gap in the sequence, so one hypen = one base.

ADD REPLYlink written 22 months ago by DNAngel30
0
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

this message seems to be generated when your DNA is not:

and then: boooom https://github.com/LeeBergstrand/NCBI_Tools/blob/281d543937237a80364105b4d14b3671136e06aa/src/objtools/blast/seqdb_writer/writedb_impl.cpp#L898

check your dna sequence, search for strange characters in the fasta. E.g:

 grep -v '^>' input.fa | grep -o . | sort | uniq -c
ADD COMMENTlink written 22 months ago by Pierre Lindenbaum122k

I tried your suggestion for checking weird characters but I keep getting another error. Perhaps this is where the issue is? Although I don't understand the error (I am not great with grep/linux commands).

It says:

Input record exceeds maximum length. Specify larger maximum.

grep: write error: Illegal seek grep: write error: Invalid or incomplete multibyte or wide character

ADD REPLYlink written 22 months ago by DNAngel30

what is the output of

file input.fa

must be something like 'ASCII text'

ADD REPLYlink written 22 months ago by Pierre Lindenbaum122k

It says: input.fa: ASCII text, with very long lines

ADD REPLYlink written 22 months ago by DNAngel30

This sounds like an issue related to sort on windows. Do you have access to a unix machine? Otherwise you could try wrapping the long fasta lines.

ADD REPLYlink written 22 months ago by genomax70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 438 users visited in the last hour