Question: Biopython - The new RefSeq release from NCBI and Bio.Entrez.Parser compatibility?
1
gravatar for Iñaki
5.9 years ago by
Iñaki20
France, Bordeaux
Iñaki20 wrote:

Hello, I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well:

handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml")
records = Entrez.read(handle)

But now I'm getting an Error:

Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.

So I run this:

records = Entrez.read(handle, validate=False)

But I'm still getting an Error:

TypeError: 'str' object does not support item assignment

After some research I realized that NCBI made new changes concerning the RefSeq which creates new tags in the xml file (of GenPept): http://www.ncbi.nlm.nih.gov/mailman/pipermail/refseq-announce/2014q2/000117.html

Do I need to change something in the DTD to support these new tags?

Thank you very much for your support.

entrez biopython refseq xml ncbi • 2.8k views
ADD COMMENTlink modified 5.9 years ago by Zhaorong1.2k • written 5.9 years ago by Iñaki20
1
gravatar for Zhaorong
5.9 years ago by
Zhaorong1.2k
State College, PA
Zhaorong1.2k wrote:

The DTD used by Bio.Entrez is out of date.

Download the DTD from here

Put it in the Bio.Entrez DTDs folder.

To find the location of the folder:

>>> from Bio import Entrez
>>> Entrez.__file__

The folder is xxxxxxxxxxxxxxxxxxxx/Bio/Entrez/DTDs

ADD COMMENTlink modified 3 months ago by RamRS26k • written 5.9 years ago by Zhaorong1.2k
1

It is unfortunate the NCBI edited this DTD file - normally they are very good about adding new dated versions instead. In any case, the Biopython copy has already been updated https://github.com/biopython/biopython/commit/9a301b5d1cecad1bb2fee3920f73740448f9aa4f but it was shortly after the Biopython 1.63 release :(

ADD REPLYlink modified 3 months ago by RamRS26k • written 5.9 years ago by Peter5.8k

It works. I didn't know where to find a new version of the DTD file.

Thank you very much! :)

ADD REPLYlink written 5.9 years ago by Iñaki20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2055 users visited in the last hour