Question: Find Amino Acid Change For Snp Using Eutils
1
gravatar for Dpsguy
9.0 years ago by
Dpsguy140
Dpsguy140 wrote:

Hi…I am just starting out with exploring Entrez Eutils using Biopython. What I need to do is find the amino acid change for a list of rsIDs of missense SNPs. I cannot figure out how to do that. I guess the answer would lie in the xml generated by this query:

handle = Entrez.efetch(db="snp", id="6046", retmode="xml")

But when I try

record = Entrez.read(handle)

It gives me an error like: The Bio.Entrez parser cannot handle XML data that make use of XML namespaces.

I don’t know why this is happening. Maybe I am missing something obvious here…

Is it even possible to get my required information using eutils? If not, can you suggest any other means (except doing it manually for every SNP)?

Thanks in advance.

eutils dbsnp biopython snp • 4.6k views
ADD COMMENTlink modified 7.3 years ago by Daniel E Cook240 • written 9.0 years ago by Dpsguy140
2
gravatar for Martijn Vermaat
9.0 years ago by
Martijn Vermaat180 wrote:

This works for me:

response = Entrez.efetch(db='SNP', id='6046', rettype='flt', retmode='xml')
minidom.parseString(response.read())
ADD COMMENTlink written 9.0 years ago by Martijn Vermaat180
1

There is possibly more than one amino acid change associated with the SNP, but you can get the annotated ones from your response by looking in the RsStruct elements (or from the HGVS descriptions on NP references in the hgvs elements). E.g. calling .getElementsByTagName('hgvs') on the parsed document could be the first step. Consult some general documentation on XML DOM navigation if you need more information.

ADD REPLYlink written 9.0 years ago by Martijn Vermaat180

Thanks for the tip! Seems like etree can also do the job. But then back to my original question: how do I get the amino acid change from this xml? I am not very familiar with xml and was relying on the Entrez parser to do the job for me. I have no experience with etree or minidom

ADD REPLYlink written 9.0 years ago by Dpsguy140
1
gravatar for Peter
9.0 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

Which version of Biopython do you have? Mine is the latest and it says:

NotImplementedError: The Bio.Entrez parser cannot handle XML data that make use of XML namespaces

You can try another Python XML parser instead. For some reason the NCBI give very different XML back for the SNP database than all their other databases, and the Bio.Entrez parser can't cope: https://redmine.open-bio.org/issues/2771

Interestingly you can try putting http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6046&retmode=xml into validators like http://www.validome.org/xml/validate/ (says it might be OK) or http://validator.w3.org/ which says its invalid.

ADD COMMENTlink modified 12 months ago by RamRS30k • written 9.0 years ago by Peter5.8k

I don’t think using another parser would help. From http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html :

eFetch utility generates an invalid XML for SNP, so currently it doesn't work through SOAP. The bug is being fixed.

This page seems to have been last updated in 2009, though. Too long a time to get a bug fixed.

So what other options do I have?

ADD REPLYlink modified 9 months ago by RamRS30k • written 9.0 years ago by Dpsguy140

First of all tell the NCBI about this, it will help them to rank priorities if they know how many people are having trouble with this. Also check out what other formats they offer for the SNP database...

ADD REPLYlink written 9.0 years ago by Peter5.8k

I wrote to NCBI and the reply was: "SNP data is also available through SOAP web service, which requires this snp specific efetch wsdl:http://eutils.ncbi.nlm.nih.gov/soap/v2.0/efetch_snp.wsdl How the XML object is requested and parsed by the bio.python is more a question for its developers since we do not have resources to trouble shoot third party software."

The best direct query according to them is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6046&rettype=xml&retmode=text

ADD REPLYlink modified 9 months ago by RamRS30k • written 9.0 years ago by Dpsguy140

@ Peter: Yes you are right...it does gave the error that you have mentioned. I have edited my question accordingly.

ADD REPLYlink written 9.0 years ago by Dpsguy140

But all this talk about invalid xml and parsers does nothing to answer my original question that is in the title. Now that I have the parsed xml using minidom (see below), how do I use that to get the amino acid change for a mutation?

ADD REPLYlink written 9.0 years ago by Dpsguy140
1
gravatar for Daniel E Cook
7.3 years ago by
Daniel E Cook240
Chicago
Daniel E Cook240 wrote:

I wrote a function to parse the data from flat files. This is a work in progress, but maybe this can be of some help to someone:

ADD COMMENTlink modified 9 months ago by RamRS30k • written 7.3 years ago by Daniel E Cook240
0
gravatar for Dpsguy
8.9 years ago by
Dpsguy140
Dpsguy140 wrote:

I guess I found a workable solution using the hints provided by Martijn Vermaat. I reproduce my code below:

flag = 0
rsid = '6046'
res = minidom.parseString(Entrez.efetch(db='snp', id=rsid, retmode='xml').read())
nodes = res.getElementsByTagName('hgvs') 
for node in nodes:
    if 'NP_' in node.firstChild.nodeValue:
        flag = 1
        val = node.firstChild.nodeValue
        regex1 = r'[A-Z][a-z]+'
        regex2 = r'[0-9]+'
        aa = re.findall(regex1, val)
        pos = re.findall(regex2, val)
        print aa[0] + " > " + aa[1] + " Position: " + pos[2]
if flag == 0:
    print "SNP not in coding region"

The output is the following:

Arg > Gln Position: 413
Arg > Leu Position: 413
Arg > Pro Position: 413
Arg > Gln Position: 391
Arg > Leu Position: 391
Arg > Pro Position: 391

If anyone can provide a better method or code, your suggestions are most welcome.

ADD COMMENTlink modified 9 months ago by RamRS30k • written 8.9 years ago by Dpsguy140

Guess I'll be immodest and accept my own answer.

ADD REPLYlink written 8.9 years ago by Dpsguy140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour