Question: Help making a blast database with longer sequence IDs
0
gravatar for ravila
5 weeks ago by
ravila0
ravila0 wrote:

I am trying to create a BLAST database with protein chain sequences. My FASTA looks like this:

>1fig_1
ENVLTQSPAIMSASPGEKVTMACRASSSVSSTYLHWYQQKSGASPKLLIYSTSNLASGVP
ARFSGS
>1p8v_2
EADCGLRPLFEKKSLEDKTERELLESYID 
>5ivx_13
MSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRARWIEQEGPEYW
ERETRRAKGNEQSFRVDL
.. etc.

Where the number after the underscore is the entity ID from the mmCIF file. (I am not using chain IDs, because it is more redundant.)

I keep getting the error:

BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB

The problem, I gather, is that makeblastdb seems to expect a maximum of six characters in the id, so if the number after the underscore is double-digit, it fails.

However, given that some pdbs have more that 9 entities, there is a need for me to utilize more characters. Is there a way to get around this limit? (I am currently using makeblastdb version 2.9.0)

ADD COMMENTlink written 5 weeks ago by ravila0
2
gravatar for genomax
5 weeks ago by
genomax70k
United States
genomax70k wrote:

Build v.5 database by adding the following option. Default is 4 so you will want to add -blastdb_version 5 to your command. Support announcement from NCBI is here.

-blastdb_version <Integer, 4..5>
   Version of BLAST database to be created
   Default = `4'
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by genomax70k

Thanks! That solved the issue.

ADD REPLYlink written 5 weeks ago by ravila0

Hi guys , I typed makeblastdb -in blastdb_nt.fasta -input_type fasta -dbtype nucl -title hbbNtBlast -parse_seqids -out hbbNtBlast then I received the error prompt BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB. Then I applied your method. When I finish the version update, I typed the same command makeblastdb, but I got the same error result again. What should I do?

ADD REPLYlink written 5 weeks ago by 135268014020

This is my first time to query here, hope I make a little clear to you about my issue. Very appreciate for any suggestion.

ADD REPLYlink written 5 weeks ago by 135268014020

As you see above original question has protein sequences. Looking at your command it looks like you have nucleotide sequences so I am not sure if the error you are seeing is because of the exact issue as the one in original question.

Can you post output of grep "^>" blastdb_nt.fasta | head -5 so we can take a look at what your fasta headers look like?

ADD REPLYlink written 5 weeks ago by genomax70k

Yeah exactly I am performing the blastn which is for the nucleic sequences. Anyway I will postout the run result as below

>X17276.1 Giant Panda satellite 1 DNA
>X51700.1 Bos taurus mRNA for bone Gla protein
>X68321.1 B.taurus mRNA for cyclin A
>X55027.1 Bovine mRNA for chromogranin B
>Z12029.1 B.indicus gene for alpha-lactalbumin
ADD REPLYlink modified 5 weeks ago by genomax70k • written 5 weeks ago by 135268014020

X17276.1 Giant Panda satellite 1 DNA

X51700.1 Bos taurus mRNA for bone Gla protein

X68321.1 B.taurus mRNA for cyclin A

X55027.1 Bovine mRNA for chromogranin B

Z12029.1 B.indicus gene for alpha-lactalbumin

ADD REPLYlink written 5 weeks ago by 135268014020
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 848 users visited in the last hour