With KEGG it's possible to retrieve aa sequence of a protein ,correspondent to a gene, in FASTA format, using the following way:
Retrieve sequence entries in FASTA format:
When the entry contains multiple sequences, specify as follows:
-f+-n+1 first sequence in FASTA format
-f+-n+2 second sequence in FASTA format
-f+-n+a amino acid sequence in FASTA format (KEGG GENES only)
-f+-n+n nucleotide sequence in FASTA format (KEGG GENES only)
The list of options may be viewed by the -h option:
This way has some limitations: it gives to you only one copy of a gene (if there are multiple copies of such gene) and it doesn't print any sequence if a gene is not marked by the searched gene name, as in the example:
where BAU, WBR, etc.. are the "kegg organism IDs" and BUAPTUC7_480, WGLp242, etc.. are the genes codes. As you can see SGL, ENT, ENC, ESA's orthologs of folD gene are not marked by "(folD)", and this fact limits the sequence retrieval.
In KEGG db each gene has also an orthology ID (K01491, in the following example)
folD; methylenetetrahydrofolate dehydrogenase (NADP+) / methenyltetrahydrofolate cyclohydrolase [EC:18.104.22.168 22.214.171.124]
IS THERE ANY WAY TO RETRIEVE GENE'S SEQUENCE IN FASTA FORMAT USING THE KEGG ORTHOLOGY CODE (K01491) instead the gene name (folD)?