Question

Help making a blast database with longer sequence IDs

0

Entering edit mode

4.8 years ago

ravila • 0

I am trying to create a BLAST database with protein chain sequences. My FASTA looks like this:

>1fig_1
ENVLTQSPAIMSASPGEKVTMACRASSSVSSTYLHWYQQKSGASPKLLIYSTSNLASGVP
ARFSGS
>1p8v_2
EADCGLRPLFEKKSLEDKTERELLESYID 
>5ivx_13
MSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRARWIEQEGPEYW
ERETRRAKGNEQSFRVDL
.. etc.

Where the number after the underscore is the entity ID from the mmCIF file. (I am not using chain IDs, because it is more redundant.)

I keep getting the error:

BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB

The problem, I gather, is that makeblastdb seems to expect a maximum of six characters in the id, so if the number after the underscore is double-digit, it fails.

However, given that some pdbs have more that 9 entities, there is a need for me to utilize more characters. Is there a way to get around this limit? (I am currently using makeblastdb version 2.9.0)

blast mkblastdb fasta sequence protein • 4.1k views

ADD COMMENT • link 4.8 years ago by ravila • 0

GenoMax · Answer 1 · 2019-07-17

3

Entering edit mode

4.8 years ago

GenoMax 141k

Build v.5 database by adding the following option. Default is 4 so you will want to add -blastdb_version 5 to your command. Support announcement from NCBI is here.

-blastdb_version <Integer, 4..5>
   Version of BLAST database to be created
   Default = `4'

ADD COMMENT • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Thanks! That solved the issue.

ADD REPLY • link 4.8 years ago by ravila • 0

0

Entering edit mode

Hi guys , I typed makeblastdb -in blastdb_nt.fasta -input_type fasta -dbtype nucl -title hbbNtBlast -parse_seqids -out hbbNtBlast then I received the error prompt BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB. Then I applied your method. When I finish the version update, I typed the same command makeblastdb, but I got the same error result again. What should I do?

ADD REPLY • link 4.8 years ago by 13526801402 • 0

0

Entering edit mode

This is my first time to query here, hope I make a little clear to you about my issue. Very appreciate for any suggestion.

ADD REPLY • link 4.8 years ago by 13526801402 • 0

0

Entering edit mode

As you see above original question has protein sequences. Looking at your command it looks like you have nucleotide sequences so I am not sure if the error you are seeing is because of the exact issue as the one in original question.

Can you post output of grep "^>" blastdb_nt.fasta | head -5 so we can take a look at what your fasta headers look like?

ADD REPLY • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Yeah exactly I am performing the blastn which is for the nucleic sequences. Anyway I will postout the run result as below

>X17276.1 Giant Panda satellite 1 DNA
>X51700.1 Bos taurus mRNA for bone Gla protein
>X68321.1 B.taurus mRNA for cyclin A
>X55027.1 Bovine mRNA for chromogranin B
>Z12029.1 B.indicus gene for alpha-lactalbumin

ADD REPLY • link updated 4.8 years ago by GenoMax 141k • written 4.8 years ago by 13526801402 • 0

0

Entering edit mode

X17276.1 Giant Panda satellite 1 DNA

X51700.1 Bos taurus mRNA for bone Gla protein

X68321.1 B.taurus mRNA for cyclin A

X55027.1 Bovine mRNA for chromogranin B

Z12029.1 B.indicus gene for alpha-lactalbumin

ADD REPLY • link 4.8 years ago by 13526801402 • 0