Parse Ncbi Eutils Xml To Csv
2
1
Entering edit mode
9.2 years ago

Do tools exist for parsing specifically NCBI Eutils XML to CSV?

This is related to tools parsing NCBI blast -m 7 xml output format?

An example file is available at : https://raw.github.com/low-decarie/stack_examples/master/ncbi_example.xml

I would like to associate the accession numbers with basic metadata on the study (title,alt_elev,environment,envo_biome,envo_feature,lat,lon,lat_long, number of reads).

xml eutils ncbi • 3.6k views
ADD COMMENT
1
Entering edit mode

your XML for SRA is a structured/complex document: it cannot be converted to CSV unless you explain us what kind of CSV data is expected, or what you want to achieve with it.

ADD REPLY
2
Entering edit mode
9.2 years ago

Given that this is SRA xml, you might consider using SRAdb (Bioconductor). There is a separately packaged sqlite database into which we have placed all the SRA metadata. You can use the R package to do the queries or simply use any sqlite client on the same sqlite file. Documentation for the R package includes multiple example queries.

http://www.bioconductor.org/packages/release/bioc/html/SRAdb.html

ADD COMMENT
0
Entering edit mode

That is excellent! Wish I had know of this sooner, but I currently have all the XML files on hand and would like to parse those (into a csv or into an sqlite db would be fine. Thank you

ADD REPLY
2
Entering edit mode
9.2 years ago

Inserting a complex XML like this is complicated. You should rather keep the whole XML structure and query it with xslt or xquery. However, here is a XSLT stylesheet as an example that would insert 3 tables in sqlite3. Morevover, the single quotes in the TEXT field should be escaped.

Usage:

xsltproc sra2sqlite.xsl sra.xml | sqlite3 sra.sqlite

.

~$ sqlite3 sra.sqlite ' select * from EXPERIMENT limit 10'
1|NSZ2O1P00|MCDB, U of Colorado|SRX159081
2|NSZ1O1P00|MCDB, U of Colorado|SRX159080
3|NSY2O1P00|MCDB, U of Colorado|SRX159079
4|NSU3O1P00|MCDB, U of Colorado|SRX159078
5|NSU2O1P00|MCDB, U of Colorado|SRX159077
6|NST2O1P00|MCDB, U of Colorado|SRX159076
7|NST1O1P00|MCDB, U of Colorado|SRX159075
8|NSQ3O1P00|MCDB, U of Colorado|SRX159074
9|NSP3O1P00|MCDB, U of Colorado|SRX159073
10|NSP1O1P00|MCDB, U of Colorado|SRX159072
ADD COMMENT
0
Entering edit mode

Oracle/Berkeley XML db is quite useful for storing and querying xml. Here is a reasonably nice (though dated) walkthrough:

http://www.xml.com/pub/a/2008/05/07/under-the-hood-oracle-berkeley-db-xml.html

ADD REPLY
0
Entering edit mode

@Sean I tested BerkeleyXML and eXist: the second was lighter and easier to use.

ADD REPLY

Login before adding your answer.

Traffic: 1339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6