Question

Using Biopython To Download Gene Ids From Ncbi

0

Entering edit mode

10.9 years ago

Bright • 0

Hello,

I am using Biopython's "Entrez.esearch" and "Entrez.read" functions to download a list of gene IDs from NCBI. The code is working perfectly, no error is returned, and I have been able download a couple of IDs. However, Biopython cannot find some of the gene names I provided in the code. Even though I can find the gene IDs by searching for them on the NCBI website, there are many genes in my list as such I want to automate the search.

Is there a reason for this problem? Is there another way I could use Biopython to access the IDs from NCBI?

Thanks.

biopython • 6.3k views

ADD COMMENT • link 10.9 years ago by Bright • 0

1

Entering edit mode

It's very hard to help without examples. Can you provide a couple of IDs that you can find via the web interface to entrez but not via Entrez.search?

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

Hi,

Here are the IDs (and corresponding genes) that can be found via the web interface but not from BioPython.

453232348, JC8.14, 'IV: 13253845..13254201'

453232067, F35G12.1, 'III: 4568306..4568878'

453232767, snoRNA:ZK994.7, 'V: 8500206..8500537'

Here is the example code that does not give ID:

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='JC8.14[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: []

Here is an example where we get IDs.

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='sgk-1[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: ['413004852', '453232919', '392928192', '449020132']

ADD REPLY • link 10.9 years ago by Bright • 0

0

Entering edit mode

Your search doesn't return anything in the web interface either. If you check out the records returned by just searching on JC8.14 you'll see it's not a gene name...

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

The way it has been represented in the code given above is just BioPython syntax. Searching for JC8.14 returns some results. And here is the desired result in FASTA format - http://www.ncbi.nlm.nih.gov/nuccore/453232348?report=fasta

ADD REPLY • link 10.9 years ago by Bright • 0

0

Entering edit mode

The query format isn't Biopython-specific, and if you look at that record (which is a whole chromosome) in Genbank format you'll see JC8.14 isn't a gene name, so searching on gene won't discover it.

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

I went through the results again and your observation is true. Thank you very much for the clarification and your help. Do you know any other way whether by we could pull gene information and sequences from the NCBI website?

ADD REPLY • link 10.9 years ago by Bright • 0

0

Entering edit mode

Everything you can get from the website you can get via Entrez - it's just a matter of having the right IDs to search in against the right fields. Without knowing what you are trying to do, it's not really possible to provide more specific help.

ADD REPLY • link 10.9 years ago by David W 4.9k

0

Entering edit mode

Thank you for your response David. They were very helpful.

ADD REPLY • link 10.9 years ago by Bright • 0

0

Entering edit mode

I'll echo David's point that unless you give some specific examples we can't help you. All the cases like this I have looked at the user isn't doing the same search on the website and via Biopython - often there has been a subtle difference like missing quotes or similar.

ADD REPLY • link 10.9 years ago by Peter 6.0k