makeblastdb Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type 1196094
1
0
Entering edit mode
5.1 years ago
chenyil91 • 0

Hi everyone! Recently, I run makeblastdb on the server . the order is " makeblastdb -in kraken2_blast -dbtype nucl -out nt -title "nt" -parse_seqids -hash_index"; but there is an error:

file: /home/lcy/winterbee/database/kraken-blast/nt.nsq
file: /home/lcy/winterbee/database/kraken-blast/nt.nsi
file: /home/lcy/winterbee/database/kraken-blast/nt.nsd
file: /home/lcy/winterbee/database/kraken-blast/nt.nhi
file: /home/lcy/winterbee/database/kraken-blast/nt.nhd
file: /home/lcy/winterbee/database/kraken-blast/nt.nog

Error: NCBI C++ Exception:
    T0 "/home/coremake/release_build/build/PrepareRelease_Linux64-Centos_JSID_01_350334_130.14.22.10_9008__PrepareRelease_Linux64-Centos_1481139955/c++/compilers/unix/../../src/objects/seq/../seqloc/Seq_id.cpp", line 1911: Error: ncbi::objects::CSeq_id::x_Init() - Unsupported ID type 1196094

Dose anyone know how to solve the problem? Looking forward to your reply~ Thank you! Chenyi

software error • 2.6k views
ADD COMMENT
1
Entering edit mode

Looks like there might be a problem with your input file (headers). Can you post the output of grep '>' kraken2_blast | head . Does this number/ID 1196094 appears in your fasta input file?

ADD REPLY
0
Entering edit mode
2.5 years ago
Michael 54k

Blast is rather picky about identifiers. As soon as you have identifiers containing pipes |, it assumes your are trying to give it NCBI-style identifiers. But the behavior of makeblastdb may have changed slightly, so I tested a few options. Also, makeblastdb will only report the first error it encounters.

>1196094|1196094|1196094|1196094 : this works fine, all numeric doesn't seem to trigger NCBI-schema parsing
ACGTTTT
>gi|1196094|blubb|1196094 : this doesn't work, blubb is an unsupported ID type 
ACGTTTT
>jgi|Naegr1|60034|estExt_fgeneshHS_pg.C_890017 : This is a JGI id, just take it! Bummer!
TTTTTTTTT
>mjgi|Naegr1|60034|estExt_fgeneshHS_pg.C_890017 : This works, first entry length > 3 doesn't seem to trigger NCBI-schema parsing
TTTTTTTTT

>bla|buzz|foob|1196094 : bla (and foo, bar) is obviously not a recognized ID type
ACGT
>lcl|1|1196094|2 : This likely how the header was formatted in OP's file, a valid ID type, followed by an invalid one: Unsupported ID type 1196094
ACGTTTT
>lcl|1|lcl|2|1196094 : Or like this, 2 valid pairs, plus an additional number
ACGTTTT
>lcl|1|lcl|2|lcl|3|1196094 : Or like this, you name it
ACGTTTT

lcl (local) is a valid id type, so if you want to fix the file, while keeping the number, add lcl| right after the last |, such that the header looks like:

>lcl|1|lcl|1196094

Or, replace all | with another character, or run without -parse_seqids, or ...

The following blog may also be related: https://blastedbio.blogspot.com/2012/10/my-ids-not-good-enough-for-ncbi-blast.html

ADD COMMENT
1
Entering edit mode

See: https://www.ncbi.nlm.nih.gov/books/NBK569841/

The identifier should begin right after the “>” sign on the definition line and contain no spaces and the -parse_seqids flag should be used. In general, you should not use a “|” (bar) in your identifier. The “|” (bar) is a reserved character for the NCBI FASTA ID parser and makeblastdb will return an error unless the bar is used in a specific manner described at https://ncbi.github.io/cxx-toolkit/pages/ch_demo#ch_demo.T5

ADD REPLY
0
Entering edit mode

Sure, the idea was to find out what to do as a quick fix, if you got a file that is ill-formatted (like JGI file I have in the example) or the kraken_blast file OP got. I was just trying this with a JGI protein annotation file and needed to keep the identifiers intact.

ADD REPLY

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6