Question: Using Biopython To Download Gene Ids From Ncbi
0
gravatar for Bright
6.8 years ago by
Bright0
Bright0 wrote:

Hello,

I am using Biopython's "Entrez.esearch" and "Entrez.read" functions to download a list of gene IDs from NCBI. The code is working perfectly, no error is returned, and I have been able download a couple of IDs. However, Biopython cannot find some of the gene names I provided in the code. Even though I can find the gene IDs by searching for them on the NCBI website, there are many genes in my list as such I want to automate the search.

Is there a reason for this problem? Is there another way I could use Biopython to access the IDs from NCBI?

Thanks.

biopython • 4.1k views
ADD COMMENTlink written 6.8 years ago by Bright0
1

It's very hard to help without examples. Can you provide a couple of IDs that you can find via the web interface to entrez but not via Entrez.search?

ADD REPLYlink written 6.8 years ago by David W4.7k

Hi,

Here are the IDs (and corresponding genes) that can be found via the web interface but not from BioPython.

453232348, JC8.14, 'IV: 13253845..13254201'

453232067, F35G12.1, 'III: 4568306..4568878'

453232767, snoRNA:ZK994.7, 'V: 8500206..8500537'


Here is the example code that does not give ID:

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='JC8.14[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: []

Here is an example where we get IDs.

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='sgk-1[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: ['413004852', '453232919', '392928192', '449020132']

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Bright0

Your search doesn't return anything in the web interface either. If you check out the records returned by just searching on JC8.14 you'll see it's not a gene name...

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by David W4.7k

The way it has been represented in the code given above is just BioPython syntax. Searching for JC8.14 returns some results. And here is the desired result in FASTA format - http://www.ncbi.nlm.nih.gov/nuccore/453232348?report=fasta

ADD REPLYlink modified 6.8 years ago • written 6.8 years ago by Bright0

The query format isn't Biopython-specific, and if you look at that record (which is a whole chromosome) in Genbank format you'll see JC8.14 isn't a gene name, so searching on gene won't discover it.

ADD REPLYlink written 6.8 years ago by David W4.7k

I went through the results again and your observation is true. Thank you very much for the clarification and your help. Do you know any other way whether by we could pull gene information and sequences from the NCBI website?

ADD REPLYlink written 6.8 years ago by Bright0

Everything you can get from the website you can get via Entrez - it's just a matter of having the right IDs to search in against the right fields. Without knowing what you are trying to do, it's not really possible to provide more specific help.

ADD REPLYlink written 6.8 years ago by David W4.7k

Thank you for your response David. They were very helpful.

ADD REPLYlink written 6.8 years ago by Bright0

I'll echo David's point that unless you give some specific examples we can't help you. All the cases like this I have looked at the user isn't doing the same search on the website and via Biopython - often there has been a subtle difference like missing quotes or similar.

ADD REPLYlink written 6.8 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1887 users visited in the last hour