Using Biopython To Download Gene Ids From Ncbi
0
0
Entering edit mode
10.9 years ago
Bright • 0

Hello,

I am using Biopython's "Entrez.esearch" and "Entrez.read" functions to download a list of gene IDs from NCBI. The code is working perfectly, no error is returned, and I have been able download a couple of IDs. However, Biopython cannot find some of the gene names I provided in the code. Even though I can find the gene IDs by searching for them on the NCBI website, there are many genes in my list as such I want to automate the search.

Is there a reason for this problem? Is there another way I could use Biopython to access the IDs from NCBI?

Thanks.

biopython • 6.3k views
ADD COMMENT
1
Entering edit mode

It's very hard to help without examples. Can you provide a couple of IDs that you can find via the web interface to entrez but not via Entrez.search?

ADD REPLY
0
Entering edit mode

Hi,

Here are the IDs (and corresponding genes) that can be found via the web interface but not from BioPython.

453232348, JC8.14, 'IV: 13253845..13254201'

453232067, F35G12.1, 'III: 4568306..4568878'

453232767, snoRNA:ZK994.7, 'V: 8500206..8500537'


Here is the example code that does not give ID:

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='JC8.14[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: []

Here is an example where we get IDs.

Entrez.email = "myemail" # Always tell NCBI who you are

search = Entrez.read(Entrez.esearch(db='nucleotide', term='sgk-1[gene] "Caenorhabditis elegans"[orgn]', retmode='xml'))

print(search["IdList"])

Output: ['413004852', '453232919', '392928192', '449020132']

ADD REPLY
0
Entering edit mode

Your search doesn't return anything in the web interface either. If you check out the records returned by just searching on JC8.14 you'll see it's not a gene name...

ADD REPLY
0
Entering edit mode

The way it has been represented in the code given above is just BioPython syntax. Searching for JC8.14 returns some results. And here is the desired result in FASTA format - http://www.ncbi.nlm.nih.gov/nuccore/453232348?report=fasta

ADD REPLY
0
Entering edit mode

The query format isn't Biopython-specific, and if you look at that record (which is a whole chromosome) in Genbank format you'll see JC8.14 isn't a gene name, so searching on gene won't discover it.

ADD REPLY
0
Entering edit mode

I went through the results again and your observation is true. Thank you very much for the clarification and your help. Do you know any other way whether by we could pull gene information and sequences from the NCBI website?

ADD REPLY
0
Entering edit mode

Everything you can get from the website you can get via Entrez - it's just a matter of having the right IDs to search in against the right fields. Without knowing what you are trying to do, it's not really possible to provide more specific help.

ADD REPLY
0
Entering edit mode

Thank you for your response David. They were very helpful.

ADD REPLY
0
Entering edit mode

I'll echo David's point that unless you give some specific examples we can't help you. All the cases like this I have looked at the user isn't doing the same search on the website and via Biopython - often there has been a subtle difference like missing quotes or similar.

ADD REPLY

Login before adding your answer.

Traffic: 1349 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6