Question: How To Filter A Blast Xml Output ?
2
gravatar for Manu Prestat
8.4 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

Hello list, I have very large xml blast outputs (in order to use with blast2go) and I need to reduce these consequently according to a selected criterium (for instance the E-value). Did anybody write a script for it ? If not, I would share mine when it will be done !!

Thanks,

Emmanuel

EDIT : Thank you for the answers. I'm not used in xml file handling, and I accept your proposal Egon ! So here is a subset of my blast xml output: EDIT 2: I removed the example: too big, and combined with the new "markup formatting" it makes the reading of the question very annoying.

It's big but I had to show you several situations (a multi-hsp hit for example). So I want to be able to filter this file using the hsp statistics.

Here is a detailed example of the output shrinkage I would like to conduct:

How would you do if you wanted to delete from the xml the path (including the markups) within
- <Iteration_stat> and </Iteration_stat>
- <Hsp> and </Hsp> if the HSP evalue (<Hsp_evalue>here</Hsp>) is > 1e-20
- <Hit> and </Hit> if all the HSP evalue is > 1e-20
- <Iteration> and </Iteration> if
-- all the HSP evalue is > 1e-20
-- OR there is the <Iteration_message>No hits found</Iteration_message> message
?

Thanks again for help and advice.

Emmanuel

filter xml blast parsing • 6.2k views
ADD COMMENTlink modified 6.7 years ago • written 8.4 years ago by Manu Prestat3.9k

Please add a full snippet of the XML, including the root element and an element you like to match. That way, people can suggest the proper XPath query.

ADD REPLYlink written 8.4 years ago by Egon Willighagen5.2k

When you say filter, do you mean make a smaller XML file, or just extract the key data?

ADD REPLYlink written 8.4 years ago by Peter5.8k

I mean a smaller XML file (the goal is to use blast2go).

ADD REPLYlink written 8.4 years ago by Manu Prestat3.9k
6
gravatar for Andra Waagmeester
8.4 years ago by
Maastricht, the Netherlands
Andra Waagmeester3.2k wrote:

XSLT is really nice, but it might look intimidating at first sight. To overcome this I would suggest to use a command line tool like xmlstarlet (http://xmlstar.sourceforge.net/) to get your results.

When I use your example on the following command line:

cat blastoutput.xml | xmlstarlet sel -t -m //Hsp -v "concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)" -n

The result is:

gi|17228516|ref|NP_485064.1| , 433 , 171.399648652258
gi|17228516|ref|NP_485064.1| , 233 , 94.3597334687873
gi|159899325|ref|YP_001545572.1| , 433 , 171.399648652258

For other example see the documentation: http://xmlstar.sourceforge.net/doc/UG/xmlstarlet-ug.html

The beauty of xmlstarlet is that your can use it to generate your XSLT files. If you include -C it will not generate the output, but the XSLT file to be used when you would like to apply XSLT.

pbpaste | xmlstarlet sel -C -t -m //Hsp -v "concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)" -n

<xsl:stylesheet xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:for-each select="//Hsp">
      <xsl:value-of select="concat(../../Hit_id,' , ',./Hsp_score, ' , ', ./Hsp_bit-score)"/>
      <xsl:value-of select="'
'"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>
ADD COMMENTlink modified 5 weeks ago by RamRS24k • written 8.4 years ago by Andra Waagmeester3.2k

Really nice !!!

ADD REPLYlink written 8.4 years ago by Egon Willighagen5.2k

This seems to be a very powerful tool: thank you! I have not written the good query yet, but I have to learn more about that. How would you do if you wanted to delete from the xml the path (including the markups) within

  • [?] and [?]
  • [?] and [?] if the HSP evalue ([?]here[?] 1e-20
  • [?] and [?] if all the HSP evalue is > 1e-20
  • [?] and [?] if
  • all the HSP evalue is > 1e-20
  • OR there is the [?]No hits found[?] message. ?

Emmanuel

ADD REPLYlink written 8.4 years ago by Manu Prestat3.9k
2
gravatar for Pierre Lindenbaum
8.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

You can use a simple XSLT stylesheet to filter the hsps.

For example, see this other question on biostar Standalone Blast Options that was used to get the very first Hit.

ADD COMMENTlink modified 5 weeks ago by RamRS24k • written 8.4 years ago by Pierre Lindenbaum123k
0
gravatar for Egon Willighagen
8.4 years ago by
Maastricht
Egon Willighagen5.2k wrote:

An alternative to XSLT, is to use a XPath query directly, e.g. using the xpath executable from the libxml-xpath-perl package in Debian GNU/Linux. When you added some example XML output, then I could add the appropriate XPath query.

ADD COMMENTlink written 8.4 years ago by Egon Willighagen5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour