Help making a blast database with longer sequence IDs
1
0
Entering edit mode
4.8 years ago
ravila • 0

I am trying to create a BLAST database with protein chain sequences. My FASTA looks like this:

>1fig_1
ENVLTQSPAIMSASPGEKVTMACRASSSVSSTYLHWYQQKSGASPKLLIYSTSNLASGVP
ARFSGS
>1p8v_2
EADCGLRPLFEKKSLEDKTERELLESYID 
>5ivx_13
MSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRARWIEQEGPEYW
ERETRRAKGNEQSFRVDL
.. etc.

Where the number after the underscore is the entity ID from the mmCIF file. (I am not using chain IDs, because it is more redundant.)

I keep getting the error:

BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB

The problem, I gather, is that makeblastdb seems to expect a maximum of six characters in the id, so if the number after the underscore is double-digit, it fails.

However, given that some pdbs have more that 9 entities, there is a need for me to utilize more characters. Is there a way to get around this limit? (I am currently using makeblastdb version 2.9.0)

blast mkblastdb fasta sequence protein • 4.1k views
ADD COMMENT
3
Entering edit mode
4.8 years ago
GenoMax 141k

Build v.5 database by adding the following option. Default is 4 so you will want to add -blastdb_version 5 to your command. Support announcement from NCBI is here.

-blastdb_version <Integer, 4..5>
   Version of BLAST database to be created
   Default = `4'
ADD COMMENT
0
Entering edit mode

Thanks! That solved the issue.

ADD REPLY
0
Entering edit mode

Hi guys , I typed makeblastdb -in blastdb_nt.fasta -input_type fasta -dbtype nucl -title hbbNtBlast -parse_seqids -out hbbNtBlast then I received the error prompt BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB. Then I applied your method. When I finish the version update, I typed the same command makeblastdb, but I got the same error result again. What should I do?

ADD REPLY
0
Entering edit mode

This is my first time to query here, hope I make a little clear to you about my issue. Very appreciate for any suggestion.

ADD REPLY
0
Entering edit mode

As you see above original question has protein sequences. Looking at your command it looks like you have nucleotide sequences so I am not sure if the error you are seeing is because of the exact issue as the one in original question.

Can you post output of grep "^>" blastdb_nt.fasta | head -5 so we can take a look at what your fasta headers look like?

ADD REPLY
0
Entering edit mode

Yeah exactly I am performing the blastn which is for the nucleic sequences. Anyway I will postout the run result as below

>X17276.1 Giant Panda satellite 1 DNA
>X51700.1 Bos taurus mRNA for bone Gla protein
>X68321.1 B.taurus mRNA for cyclin A
>X55027.1 Bovine mRNA for chromogranin B
>Z12029.1 B.indicus gene for alpha-lactalbumin
ADD REPLY
0
Entering edit mode

X17276.1 Giant Panda satellite 1 DNA

X51700.1 Bos taurus mRNA for bone Gla protein

X68321.1 B.taurus mRNA for cyclin A

X55027.1 Bovine mRNA for chromogranin B

Z12029.1 B.indicus gene for alpha-lactalbumin

ADD REPLY

Login before adding your answer.

Traffic: 2581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6