Question: Xml In Bioinformatics, Relevance And Uses
14
gravatar for Thaman
8.2 years ago by
Thaman3.2k
Finland
Thaman3.2k wrote:

Hi,

We all know the magnitude of XML in different aspects of Web and Application. XML being the foundation for creating document and document system indeed has a great uses and influence on Bioinformatics in real. Examples: PDB xml file, phyloxml which I have encountered till date.

My main concern was regarding bioxml which is not available just some broken link. But, I found something informative regarding XML uses in Bioinformatics application through [Paul Gordon] link[2].

I found this article to be more informative: An XML application for genomic data interoperation.

So my questions are:

  • Is there any Bioxml governing body?
  • How XML have facilitated in general Bioinformatics?
  • Impact and relevance of XML in Bioinformatics?
  • Besides Paul Gordon info, PDB XML file, phyloxml what I am missing?
  • In real how XML schema can be converted into Biodatabases?
  • What are the big hitter Bioxml application developed?
  • In what kind of biological data XML can be set as a standard file format?

Thank you

xml subjective • 4.2k views
ADD COMMENTlink modified 8.2 years ago by Chris Maloney330 • written 8.2 years ago by Thaman3.2k
11
gravatar for Daniel Swan
8.2 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

I think perhaps you're missing the point of XML use in Bioinformatics. Any data can be represented in an XML format, really. The point is about interoperability, data exchange and parsing.

One of the things that we have learned from years of flat file formats, is that our parsers break with every format change in a new release. Certainly lots of tools now expect the ouput of BLAST to be in XML format - so that there are no ambiguities parsing it.

Certainly programatically, it is a lot easier to interface with an XML document - and as such MANY biological resources will release some kind of XML representation of the data - this is true for sequence data, proteomics data or microarray data. The representation in XML is generally community agreed, but unlikely to conform to some 'higher' standard, or indeed, bear any resemblance to any other XML resource, other than the fact it is XML.

BioXML appears at some point to have been a collection of DTD's for biology. I was not aware of it's existence until you mentioned it.

ADD COMMENTlink written 8.2 years ago by Daniel Swan13k

I should have mentioned how greatly it has help in parsing.

ADD REPLYlink written 8.2 years ago by Thaman3.2k
11
gravatar for Pierre Lindenbaum
8.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

XML is just a format , It can make things faster but like any other format, you can't say that it had an impact on science/bioinformatics.

  • Besides Paul Gordon info, PDB XML file, phyloxml what I am missing?

all the schemas defined by the NCBI: http://www.ncbi.nlm.nih.gov/dtd/

  • In real how XML schema can be converted into Biodatabases?

It depends what kind of information you want to store in the database. if you just want to put the whole XML document in the database, the you can just use something like BerkeleyDB-XML or eXists.

If you want store the content of your XML into a relational database, you can use XSLT to transform the nodes into a set of SQL queries. e.g: http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/psi2sql.xslt

  • In what kind of biological data XML can be a standard file format?

any data can be stored in RDF ! see http://bio2rdf.org

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Pierre Lindenbaum119k

That is very discriptive Pierre, thanks :)

ADD REPLYlink written 8.2 years ago by Thaman3.2k

Hi Pierre, just a minor point: in your first point you talk about XML schema, but your example uses a DTD. Do you have a schema-based example, because I think the future is in XML schema not dtd?

ADD REPLYlink written 8.2 years ago by Michael Dondrup46k

Yes Michael: "xjc 'http://www.uniprot.org/docs/uniprot.xsd'"

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum119k

thanx, didn't use jax before, I think it's really easy to use

ADD REPLYlink written 8.2 years ago by Michael Dondrup46k

"in your first point you talk about XML schema, but your example uses a DTD". This seems to be from the common misconception that DTDs and "schema" are distinct. In fact, DTDs are a kind of schema. From Wikipedia, "An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents". The misconception comes from the common use of "XML schema" to refer to W3C's XML Schema language, also referred to as "XSD".

ADD REPLYlink written 7.2 years ago by Chris Maloney330

@Chris, I agree. It is just semantics :-)

ADD REPLYlink written 7.2 years ago by Pierre Lindenbaum119k
7
gravatar for lh3
8.2 years ago by
lh331k
United States
lh331k wrote:

Firstly, +1 to the two posts above. I have learned a lot. Here is an outsider's view:

  • In what kind of biological data XML can be set as a standard file format?

XML is an ideal format to present complex data in small to medium size (e.g. <1GB). It is much clearer and much less error-prone. I see it has the potential to become the standard format for phylogenetic trees, BLAST output, GenBank/EMBL, PDB and network (already been). On the other hand, XML may not be appropriate for very large data. It is also a little overkilling for simple structureless data that can be represented in TAB-delimited formats.

  • How XML have facilitated in general Bioinformatics?

XML usually eases format parsing once you get used to XML. As Daniel said, it avoids a lot of pitfalls in parsing formats, especially those formats without a formal spec (e.g. the standard BLAST output and the output of most programs).

However, as an outsider, I also see the following factors that hamper the adoption of XML in Bioinformatics.

  • Memory. After googling a few XML parser benchmarks (e.g. this one), I have the impression that many stream XML parsers may still use memory larger than the file itself. A few parsers (e.g. libxml2/PULL) implements streaming properly, but they are not the default parsers as in Perl and Python.

  • Speed. Without any evidence, I tend to believe parsing XML is slower than parsing a plain text file. Parsing XML is almost certainly slower than parsing specialized binary formats, probably a lot. This could be a concern for large data sets.

  • Other factors may be Unix unfriendliness and technical complexity, but perhaps once we get used to XML, these are not major concerns. I do not know.

EDIT: Let me elaborate more on the last bullet.

  • Unix unfriendliness. I know there are tools to covert XML to line-based format (I used them). But when we want to open multiple XML files without creating temporary files, it becomes a little painful, though solvable.

  • Technical complexity. I certainly do not mean using a DOM parser is complex, but using a SAX/StAX/PULL parser is more complicated. Another thing I mean by "complexity" is it is overkilling for very simple data.

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by lh331k
1

Unix unfriendliness: xsltproc is your XML best friend http://xmlsoft.org/XSLT/xsltproc2.html :-)

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum119k

+1 except for your last bullet point. This is ( 'Unix unfriendliness') just ill defined, and technical complexity? Simply not true anymore (see Pierre's answer), don't really get why you insist on that.

ADD REPLYlink written 8.2 years ago by Michael Dondrup46k

+1 Well mentioned about speed, memory and complexity.

ADD REPLYlink written 8.2 years ago by Thaman3.2k

+1 for the last edit

ADD REPLYlink written 8.2 years ago by Pierre Lindenbaum119k
7
gravatar for Jeremy Leipzig
8.2 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

most of the caBIO/caBIG tools at NCI are XML based

one of the best implementations of XML as a biological markup is Tommy Liu's Stanford HIVdb Sierra web service, in which you submit drug resistant HIV sequences programmatically and get back these well structured reports: http://hivdb.stanford.edu/pages/webservices/goodReturn.html

I think XML for Bioinformatics by Ethan Cerami is an underrated book on this subject. Despite being roughly 6 years old it still discusses stuff like Axis! I wrote some errata for this book awhile back: http://jermdemo.blogspot.com/2008/11/errata-to-xml-for-bioinformatics-by.html

A rival to XML is YAML http://en.wikipedia.org/wiki/YAML

ADD COMMENTlink written 8.2 years ago by Jeremy Leipzig18k
1
gravatar for Hamish
7.2 years ago by
Hamish3.1k
UK
Hamish3.1k wrote:

For a list of bioinformatics XML formats see: http://www.ebi.ac.uk/Tools/webservices/tutorials/aa_xml_formats

The use of XML has not addressed the proliferation of data formats problem. Although the use of XSLT helps with formats conversion, there are issues with formats that don't quite map. And there is the issue of a lot of the common tools don't support XML as input, even if they do support it for output.

ADD COMMENTlink written 7.2 years ago by Hamish3.1k
0
gravatar for Chris Maloney
7.2 years ago by
Chris Maloney330
Bethesda, MD
Chris Maloney330 wrote:

I'd suggest that this question is a little bit like asking "Computer files in Bioinformatics -- relevance and uses." XML formats are ubiquitous nowadays. Any attempt to list or catalog all the XML formats used in bioinformatics is obsolete before it's ever finished.

ADD COMMENTlink written 7.2 years ago by Chris Maloney330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1880 users visited in the last hour