I have a list of NCBI genome accession numbers of the form: NC_####### and I want to download the protein fasta files corresponding to the genome annotations of the accession numbers.
I have tried (using Python 2.7):
import os
from Bio import Entrez, SeqIO
Entrez.email = "email@example.com"
id_list = "NC_004757"
handle = Entrez.esearch(db="nuccore", term = id_list)
record = Entrez.read(handle)
gi_list = record["IdList"]
gi_str = ",".join(gi_list)
handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="fasta_cds_aa")
records = list(SeqIO.parse(handle, "fasta"))
for item in records:
printitem.id)
But the runtime is so long I believe there must be an issue. Any idea on how I can access these genome annotation fasta files in bulk?