The use of 'fastacmd' and 'blastdbcmd' suggests you are trying to get the UniProtKB sequences from an NCBI BLAST database. Depending on how the database was constructed look-ups using the various identifiers may or may not work.
Firstly the NCBI BLAST database needs to have been build with indexing of the sequence identifiers enabled (i.e. with '-oT' for 'formatdb' or '-parse_seqids' for 'makeblastdb'). The BLAST databases provided on the NCBI's FTP site should all have this enabled, but for other NCBI BLAST databases this may not have been enabled when the database was created.
For the 'nr' BLAST database provided by NCBI look-ups are supported using all the entry identifiers appearing in the fasta header line. So for UniProtKB:WAP_RAT the 'nr' fasta header line is:
>gi|139691|sp|P01174.2|WAP_RAT RecName: Full=Whey acidic protein; Short=WAP; AltName: Full=Whey phosphoprotein; Flags: Precursor >gi|5679681|emb|CAA25600.2| whey acidic protein [Rattus norvegicus]
Which means we can search 'nr' with:
NCBI gi number:
blastdbcmd -db nr -dbtype prot -entry '139691' -get_dups
blastdbcmd -db nr -dbtype prot -entry '5679681' -get_dups
UniProtKB accession:
blastdbcmd -db nr -dbtype prot -entry 'P01174' -get_dups
UniProtKB sequence version accession:
blastdbcmd -db nr -dbtype prot -entry 'P01174.2' -get_dups
UniProtKB entry name aka. UniProtKB ID:
blastdbcmd -db nr -dbtype prot -entry 'WAP_RAT' -get_dups
INSDC protein_id:
blastdbcmd -db nr -dbtype prot -entry 'CAA25600' -get_dups
For BLAST databases which were built from fasta format data which used an alternative header format, for example a 'uniprotkb' BLAST database generated from the UniProtKB fasta files provided by EMBL-EBI (ftp://ftp.ebi.ac.uk/pub/databases/fastafiles/uniprot/) which use the fasta header format:
>SP:WAP_RAT P01174 Whey acidic protein OS=Rattus norvegicus GN=Wap PE=1 SV=2
The support for parsing the identifier in NCBI BLAST can be insufficient. In which case the entries can only be retrieved by using the generic fasta identifier (i.e. first "word" on the header line):
blastdbcmd -db uniprotkb -dbtype prot -entry 'SP:WAP_RAT' -get_dups
The 'fastacmd' program works in exactly the same way, but the command-line syntax is a little bit different, for example fetching the example sequence from above using the UniProtKB sequence version uses the command-line:
fastacmd -d nr -pT -s 'P01174.2' -aT
Note: 'fastacmd' and 'blastdbcmd' support batch retrieval using a comma separated list of identifiers, so when fetching many entries you may want to batch them for efficiency reasons. The queries above use the '-get_dups' or '-aT' to allow for cases where an identifier may correspond to multiple sequences (shouldn't happen in these databases, but you never know).
If you do not have an appropriate NCBI BLAST database for these look-ups, then web based options such as those mentioned in the other answers (e.g. UniProt.org RESTful API, EMBL-EBI dbfetch, NCBI E-utils, etc.) may be more appropriate depending on how much of the database you need. Otherwise you may want to download the data, and appropriate indexing software (e.g. NCBI BLAST, EMBOSS, BioPerl, etc.) in order to perform the look-ups locally.
That's impressive. Good to see in details. Very useful answer. Really appreciate for that.
Great answer. Is there a built-in way to limit the search to only the initial gi? e.g. in your example above, retrieve the FASTA entry via
but not by: