How To Filter A Blast Xml Output ?
3
2
Entering edit mode
13.4 years ago

Hello list, I have very large xml blast outputs (in order to use with blast2go) and I need to reduce these consequently according to a selected criterium (for instance the E-value). Did anybody write a script for it ? If not, I would share mine when it will be done !!

Thanks,

Emmanuel

EDIT : Thank you for the answers. I'm not used in xml file handling, and I accept your proposal Egon ! So here is a subset of my blast xml output: EDIT 2: I removed the example: too big, and combined with the new "markup formatting" it makes the reading of the question very annoying.

It's big but I had to show you several situations (a multi-hsp hit for example). So I want to be able to filter this file using the hsp statistics.

Here is a detailed example of the output shrinkage I would like to conduct:

How would you do if you wanted to delete from the xml the path (including the markups) within
- <Iteration_stat> and </Iteration_stat>
- <Hsp> and </Hsp> if the HSP evalue (<Hsp_evalue>here</Hsp>) is > 1e-20
- <Hit> and </Hit> if all the HSP evalue is > 1e-20
- <Iteration> and </Iteration> if
-- all the HSP evalue is > 1e-20
-- OR there is the <Iteration_message>No hits found</Iteration_message> message
?

Thanks again for help and advice.

Emmanuel

xml blast filter parsing • 8.6k views
ADD COMMENT
0
Entering edit mode

Please add a full snippet of the XML, including the root element and an element you like to match. That way, people can suggest the proper XPath query.

ADD REPLY
0
Entering edit mode

When you say filter, do you mean make a smaller XML file, or just extract the key data?

ADD REPLY
0
Entering edit mode

I mean a smaller XML file (the goal is to use blast2go).

ADD REPLY
6
Entering edit mode
13.4 years ago

XSLT is really nice, but it might look intimidating at first sight. To overcome this I would suggest to use a command line tool like xmlstarlet (http://xmlstar.sourceforge.net/) to get your results.

When I use your example on the following command line:

cat blastoutput.xml | xmlstarlet sel -t -m //Hsp -v "concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)" -n

The result is:

gi|17228516|ref|NP_485064.1| , 433 , 171.399648652258
gi|17228516|ref|NP_485064.1| , 233 , 94.3597334687873
gi|159899325|ref|YP_001545572.1| , 433 , 171.399648652258

For other example see the documentation: http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html

The beauty of xmlstarlet is that your can use it to generate your XSLT files. If you include -C it will not generate the output, but the XSLT file to be used when you would like to apply XSLT.

pbpaste | xmlstarlet sel -C -t -m //Hsp -v "concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)" -n

<xsl:stylesheet xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:for-each select="//Hsp">
      <xsl:value-of select="concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)"/>
      <xsl:value-of select="'
'"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>
ADD COMMENT
0
Entering edit mode

Really nice !!!

ADD REPLY
0
Entering edit mode

This seems to be a very powerful tool: thank you! I have not written the good query yet, but I have to learn more about that. How would you do if you wanted to delete from the xml the path (including the markups) within

  • [?] and [?]
  • [?] and [?] if the HSP evalue ([?]here[?] 1e-20
  • [?] and [?] if all the HSP evalue is > 1e-20
  • [?] and [?] if
  • all the HSP evalue is > 1e-20
  • OR there is the [?]No hits found[?] message. ?

Emmanuel

ADD REPLY
2
Entering edit mode
13.4 years ago

You can use a simple XSLT stylesheet to filter the hsps.

For example, see this other question on biostar Standalone Blast Options that was used to get the very first Hit.

ADD COMMENT
0
Entering edit mode
13.4 years ago

An alternative to XSLT, is to use a XPath query directly, e.g. using the xpath executable from the libxml-xpath-perl package in Debian GNU/Linux. When you added some example XML output, then I could add the appropriate XPath query.

ADD COMMENT

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6