How to access specific gene data from the NCBI Gene database with Biopython
1
0
Entering edit mode
2.9 years ago

Hi all,

I'm trying to get the all the gene synonyms for a certain gene in NCBI with Biopython.

from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
handle = Entrez.efetch(db="gene", id="3675", rettype="", retmode="")
results = handle.read()

This code will return all the data related to a certain gene in the format ASN.1 (check here for the possible formats being returned).

I have now looked in the whole Biopython documentation there's no way to easily access components of this returned asn.1 string, no parser nothing. I even tried a couple of python asn.1 packages but they seem to only decode binary asn.1 files.

Ideally I'd like to have dictionary format or similar to access elements by key. What's the best way to approach this?

Thanks a lot!

Biopython • 633 views
ADD COMMENT
1
Entering edit mode
2.9 years ago

You could fetch the XML format and turn that into a dictionary like so:

# pip install xmltodict

import xmltodict

from pprint import pprint

from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
handle = Entrez.efetch(db="gene", id="3675", rettype="", retmode="xml")
results = handle.read()

data = xmltodict.parse(results)

pprint (data)

prints a gigantic file.

You might be much better off getting the information with entrez direct:

efetch -db gene -id 3675 -format xml > out.xml
cat out.xml | xtract -pattern Gene-ref -element Gene-ref_locus

prints:

ITGA3
ADD COMMENT

Login before adding your answer.

Traffic: 1689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6