I am trying to create a BLAST database with protein chain sequences. My FASTA looks like this:
>1fig_1
ENVLTQSPAIMSASPGEKVTMACRASSSVSSTYLHWYQQKSGASPKLLIYSTSNLASGVP
ARFSGS
>1p8v_2
EADCGLRPLFEKKSLEDKTERELLESYID
>5ivx_13
MSHSLRYFVTAVSRPGFGEPRYMEVGYVDNTEFVRFDSDAENPRYEPRARWIEQEGPEYW
ERETRRAKGNEQSFRVDL
.. etc.
Where the number after the underscore is the entity ID from the mmCIF file. (I am not using chain IDs, because it is more redundant.)
I keep getting the error:
BLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB
The problem, I gather, is that makeblastdb seems to expect a maximum of six characters in the id, so if the number after the underscore is double-digit, it fails.
However, given that some pdbs have more that 9 entities, there is a need for me to utilize more characters. Is there a way to get around this limit? (I am currently using makeblastdb version 2.9.0)
Thanks! That solved the issue.
Hi guys , I typed
makeblastdb -in blastdb_nt.fasta -input_type fasta -dbtype nucl -title hbbNtBlast -parse_seqids -out hbbNtBlast
then I received the error promptBLAST Database creation error: Multi-letters chain PDB id is not supported in v4 BLAST DB
. Then I applied your method. When I finish the version update, I typed the same command makeblastdb, but I got the same error result again. What should I do?This is my first time to query here, hope I make a little clear to you about my issue. Very appreciate for any suggestion.
As you see above original question has protein sequences. Looking at your command it looks like you have nucleotide sequences so I am not sure if the error you are seeing is because of the exact issue as the one in original question.
Can you post output of
grep "^>" blastdb_nt.fasta | head -5
so we can take a look at what your fasta headers look like?Yeah exactly I am performing the blastn which is for the nucleic sequences. Anyway I will postout the run result as below