Makeblastdb Error In Blast+ 2.2.26
2
1
Entering edit mode
12.3 years ago
ranlib ▴ 40

I am running makeblastdb from blast+ version 2.2.26 on uniprot_trmbl.fasta and get the following error

BLAST Database creation error: Error: Duplicate seq ids are found: GNL|BLORDID|2707210

Any ideas what's going on?

Thanks.

blast • 12k views
ADD COMMENT
2
Entering edit mode

Adding -maxfilesz '10GB' (the default is 1GB) solved the problem.

ADD REPLY
1
Entering edit mode
8.2 years ago
shengweima ▴ 60

Yes, I have the same problem when I use BLAST (2.4.0+)

makeblastdb -in 160509_Chinese_Spring_v0.4_pseudomolecules.fasta -hash_index -title target -dbtype nucl

the error is:

BLAST Database creation error: Error: Duplicate seq_ids are found: 
GNL|BL_ORD_ID:5

In fact, there is no duplicate ids in my fasta file, My fasta file is up to 15Gb

solution,add the -parse_seqids

ADD COMMENT
0
Entering edit mode
12.3 years ago
JC 13k

well, the error message is saying that your fasta file have duplicate IDs. Blast tries to parse the fasta header to obtain an unique ID (check http://www.uniprot.org/help/fasta-headers). If you data base doen't include unique ID you have 2 options: 1) remove duplicate sequences, or 2) change the IDs for some unique key.

ADD COMMENT
0
Entering edit mode

It must be some sort of bug, there are not duplicates in that fasta file. The error didn't show up anymore when I added -maxfilesz '10GB' to makeblastdb.

ADD REPLY
3
Entering edit mode

The same bug appears to be present in the latest BLAST (2.4.0+). Setting -max_file_size '2GB' fixed it for me. This particular db has a few very large genomes in it, which may be related to the error..

ADD REPLY
0
Entering edit mode

Dear Ranlib, When I tried to set -maxfilesz to 10GB, take a error that says the max size must be lower than 4 GB

ADD REPLY
0
Entering edit mode

Did you find any solution to this?

I'm also getting this duplicate error when using -parse_seqids and/or max_file_sz=4GB. I've downloaded the trEMBL data (https://www.uniprot.org/help/downloads).

Thanks

ADD REPLY
0
Entering edit mode

Hmm - 4gb is a historical maximal file size limit for many filesystems.

It's possible (but unlikely) that trembl messed up. Perhaps try something along these lines:

grep '>' db.fasta | sort | cut -f 1 -d ' ' | uniq -d to see if any obviously duplicated identifiers are in there.

It may also be worth formatting with a different version of BLAST. They sometimes have bugs too... but more importantly the more recent versions have become better at explaining errors, including by highlighting specific problematic lines.

Finally, if you get an explicit error, try to disentangle where it is coming from by looking for that identifier in the input file.

ADD REPLY

Login before adding your answer.

Traffic: 1140 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6