Question: Issues In Parsing The Xml For Dbsnp Via Biopython
2
gravatar for heath
7.0 years ago by
heath20
United States
heath20 wrote:

am trying to parsing the xml(?) from the Entrez's dbSNP database

 from Bio import Entrez
 Entrez.email="xxxxl@gmail.com"
 handle=Entrez.efetch(db="snp", id="121434622",)
 cont=handle.read()

I see there some poster related to how to parse the file from Entrez:

http://stackoverflow.com/questions/11322250/biopython-class-instance-output-from-entrez-read-i-dont-know-how-to-manipula But strange enough(?) the cont.type i got is a str not a class? I saw a post in 2009 said it may be a bug at NCBI for the dbSNP, but i am not sure it is still true after 4 yrs. Any efficient way i should use to parse the information from dbSNP?

Thanks a lot!

xml entrez biopython parser • 2.3k views
ADD COMMENTlink modified 7.0 years ago by David W4.7k • written 7.0 years ago by heath20
3
gravatar for David W
7.0 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

Couple of things here,

1) The handle you create with Entrez.efetch acts just like a file handle, so reading it into cont gave you a string, not a parsed record. If you print that string you'll see it's not XML but dbSNPs native format (with many curly braces). To get XML records you need to set rettype to xml:

handle=Entrez.efetch(db="snp", id="121434622", rettype="xml")

Ordinarily, you'd parse the contents of that handle with Entrez.read()

record = Entrez.read(handle)

2) As it happens, the XML records for snSNP are a bit different than other NCBI records, and Biopython doesn't handle them. This question has some work arounds for Find Amino Acid Change For Snp Using Eutils

ADD COMMENTlink written 7.0 years ago by David W4.7k

Thanks a lot! It is extremely helpful!

ADD REPLYlink written 7.0 years ago by heath20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1590 users visited in the last hour