Hello everyone: I'm having a problem trying to download gene sequences from the Gene database at NCBI website using biopyhon. I iniciated the code by setting up a basic test search for two gene sequences in the "gene" database for S. coelicolor (txid100226).
from Bio import Entrez Entrez.email = "firstname.lastname@example.org" handle = Entrez.esearch(db="gene",term="txid100226[Organism]",retmax=2) record = Entrez.read(handle)
The first ID for the first hit on this search is:
record_list = record["IdList"] print record_list 1096915
So this first ID was used to download the gene of interest by using this:
seq = Entrez.efetch(db="gene",id=record_list,rettype="fasta").read()
However the result stored in "seq" is the following:
- Other Aliases:
- SCO1489, SC9C5.13, bldD
- Genomic context:
- NC_003888.3 (1592381..1592884)
If I put db="protein" instead of gene I get the correct protein sequence.
I realize that one way to download the DNA sequence was manually, directly from the contig NC_003888.3 in S. coelicolor at the position 1592381..1592884 for this particular ID. That info is stored in "seq"
So here is the question: Is there any method (or trick) to download that DNA sequence using biopython? How can I solve this problem?