I have a list of NCBI genome accession numbers of the form: NC_####### and I want to download the protein fasta files corresponding to the genome annotations of the accession numbers.
I have tried (using Python 2.7):
import os from Bio import Entrez, SeqIO Entrez.email = "email@example.com" id_list = "NC_004757" handle = Entrez.esearch(db="nuccore", term = id_list) record = Entrez.read(handle) gi_list = record["IdList"] gi_str = ",".join(gi_list) handle = Entrez.efetch(db="nuccore", id=gi_str, rettype="fasta_cds_aa") records = list(SeqIO.parse(handle, "fasta")) for item in records: printitem.id)
But the runtime is so long I believe there must be an issue. Any idea on how I can access these genome annotation fasta files in bulk?