Biopython - The new RefSeq release from NCBI and Bio.Entrez.Parser compatibility?
1
1
Entering edit mode
10.2 years ago
Iñaki ▴ 20

Hello, I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well:

handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml")
records = Entrez.read(handle)

But now I'm getting an Error:

Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False.

So I run this:

records = Entrez.read(handle, validate=False)

But I'm still getting an Error:

TypeError: 'str' object does not support item assignment

After some research I realized that NCBI made new changes concerning the RefSeq which creates new tags in the xml file (of GenPept): http://www.ncbi.nlm.nih.gov/mailman/pipermail/refseq-announce/2014q2/000117.html

Do I need to change something in the DTD to support these new tags?

Thank you very much for your support.

Biopython NCBI Entrez xml RefSeq • 4.3k views
ADD COMMENT
1
Entering edit mode
10.2 years ago
Zhaorong ★ 1.4k

The DTD used by Bio.Entrez is out of date.

Download the DTD from here

Put it in the Bio.Entrez DTDs folder.

To find the location of the folder:

>>> from Bio import Entrez
>>> Entrez.__file__

The folder is xxxxxxxxxxxxxxxxxxxx/Bio/Entrez/DTDs

ADD COMMENT
1
Entering edit mode

It is unfortunate the NCBI edited this DTD file - normally they are very good about adding new dated versions instead. In any case, the Biopython copy has already been updated https://github.com/biopython/biopython/commit/9a301b5d1cecad1bb2fee3920f73740448f9aa4f but it was shortly after the Biopython 1.63 release :(

ADD REPLY
0
Entering edit mode

It works. I didn't know where to find a new version of the DTD file.

Thank you very much! :)

ADD REPLY

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6