Question: Difficult To Download Gene Sequences From Ncbi
0
gravatar for jcastrofigueroa
7.1 years ago by
Norwich, UK
jcastrofigueroa140 wrote:

Hello everyone: I'm having a problem trying to download gene sequences from the Gene database at NCBI website using biopyhon. I iniciated the code by setting up a basic test search for two gene sequences in the "gene" database for S. coelicolor (txid100226).

from Bio import Entrez
Entrez.email = "chief@marsstation.com"
handle = Entrez.esearch(db="gene",term="txid100226[Organism]",retmax=2)
record = Entrez.read(handle)

The first ID for the first hit on this search is:

record_list = record["IdList"]
print record_list[0]
1096915

So this first ID was used to download the gene of interest by using this:

seq = Entrez.efetch(db="gene",id=record_list[0],rettype="fasta").read()

However the result stored in "seq" is the following:


http://www.ncbi.nlm.nih.gov/data_specs/dtd/NCBI_Entrezgene.dtd">
<Entrezgene-Set>

SCO1489 –DNA-binding protein [Streptomyces coelicolor A3(2)]

DNA-binding protein

Other Aliases:
SCO1489, SC9C5.13, bldD
Genomic context:
Chromosome
Annotation:
NC_003888.3 (1592381..1592884)
ID:
1096915
</Entrezgene-Set>

If I put db="protein" instead of gene I get the correct protein sequence.

I realize that one way to download the DNA sequence was manually, directly from the contig NC_003888.3 in S. coelicolor at the position 1592381..1592884 for this particular ID. That info is stored in "seq"

So here is the question: Is there any method (or trick) to download that DNA sequence using biopython? How can I solve this problem?

JFC

biopython entrez • 3.5k views
ADD COMMENTlink modified 7.1 years ago by Leandro Lima960 • written 7.1 years ago by jcastrofigueroa140
1
gravatar for Neilfws
7.1 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

The short answer is that rettype = "fasta" is not a valid return mode for the Gene database. Please refer to Table 1 in the EFetch section of the NCBI EUtils documentation.

The longer answer - how to solve this problem - I'll edit this answer later, no time to write it just now.

ADD COMMENTlink written 7.1 years ago by Neilfws48k

Even if I try to change the rettype, it doesn't work. The gene sequence for this example is within contig sequence, so the GI code for this sequence directs you to the contig. I don't know what to do to solve it, but thank you for your answer.

ADD REPLYlink written 7.1 years ago by jcastrofigueroa140

Well no, changing rettype won't work. The only valid rettype for db=Gene is gene_table; valid retmodes are asn.1, xml and text. In short: sequences cannot be retrieved from the Gene database.

ADD REPLYlink written 7.1 years ago by Neilfws48k
0
gravatar for Ashutosh Pandey
7.1 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

Well I am not used to using Entrez gene but I think you are retrieving the Entrez gene page information instead of the sequence information. You should try either "genbank" or "nucleotide" instead of "gene" and see if it helps.

ADD COMMENTlink written 7.1 years ago by Ashutosh Pandey12k

Thanks for your answer, but it didn't work :( If I use "gene bank" it displays an error and if I try with nucleotide database, what I get is the whole contig. Hmm, about using Entrez gene I'm sure that I'm not retrieving the information page, because I get a protein sequence.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by jcastrofigueroa140
0
gravatar for Leandro Lima
7.1 years ago by
Leandro Lima960
San Francisco, CA
Leandro Lima960 wrote:

Hello! I think this could help you.

problem when downloading large number of sequences from Genbank

ADD COMMENTlink modified 3.7 years ago • written 7.1 years ago by Leandro Lima960

Not really since fasta cannot be retrieved from the Gene database.

ADD REPLYlink written 7.1 years ago by Neilfws48k
1

In this case, db="nuccore"

ADD REPLYlink written 7.1 years ago by Leandro Lima960
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour