Question: Tools Parsing Ncbi Blast -M 7 Xml Output Format?
3
gravatar for Lhl
9.5 years ago by
Lhl730
United States
Lhl730 wrote:

Hi all,

Is there any script or tool which is able to parse NCBI blast xml output (produced with -m 7 option) ?

I want a tab delimited file containing the following information:

 Name of the query sequence             Seq1
 2. Length of the query sequence           30
 3. Name of target sequence                gnl|BL_ORD_ID|0
 4. Length of target sequence              5528445
 5. Alignment bit score                    59.96
 6. E-value                                8.38112e-11
 7. Start of alignment within query        1
 8. End of alignment within query          30
 9. Start of alignment within target       5436010
10. End of alignment within target         5436039
11. Query frame                            1
12. Target frame                           1
13. Number of identical bases within       29
    the alignment
14. Alignment length                       30
15. Aligned portion (sequence) of query    CGGACAGCGCCGCCACCAACAAAGCCACCA
16. Aligned portion (sequence) of target   CGGACAGCGCCGCCACCAACAAAGCCATCA
17. Midline indicating positions of        ||||||||||||||||||||||||||| ||
    matches within the alignment

Thanks.

Elzed

xml blast parsing • 15k views
ADD COMMENTlink modified 9.5 years ago by Dejian1.3k • written 9.5 years ago by Lhl730
5
gravatar for Neilfws
9.5 years ago by
Neilfws49k
Sydney, Australia
Neilfws49k wrote:

All of the major Bio* projects contain libraries to parse BLAST XML output:

  • Bioperl - use the SearchIO module with option -format=>'blastxml'
  • BioPython - their tutorial recommends to use XML output for parsing
  • BioRuby - Bio::Blast.reports will read an XML file

Once you figure out how to extract the required fields, writing to CSV is quite easy in any of these languages.

Also, don't forget that running blastall with the -m 8 or -m 9 options will generate tab-delimited output (but if I recall correctly, not including the aligned sequences, which you need).

ADD COMMENTlink written 9.5 years ago by Neilfws49k
3

And the minor ones, too! http://hackage.haskell.org/packages/archive/bio/0.5.0.1/doc/html/Bio-Alignment-BlastXML.html

ADD REPLYlink written 9.5 years ago by Ketil4.0k

Thanks Neilfws. I got the XML files, which are required by other softs for annotation and it contains millions of sequences, so i do not want to wait for weeks by redoing blast with -m 8/9.

ADD REPLYlink written 9.5 years ago by Lhl730
4
gravatar for Pierre Lindenbaum
9.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

You can use XSLT to transform your xml to a tabular format:


<xsl:stylesheet version="1.0" xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform"
 xmlns="http://www.w3.org/1999/xhtml"
 >


<xsl:output method="text"/>

<xsl:template match="/">
<xsl:apply-templates select="BlastOutput"/>
</xsl:template>



<xsl:template match="BlastOutput">
<xsl:variable name="queryDef" select="BlastOutput_query-def"/>
<xsl:variable name="queryLen" select="BlastOutput_query-len"/>
<xsl:for-each select="BlastOutput_iterations/Iteration/Iteration_hits/Hit">
<xsl:variable name="hitDef" select="Hit-def"/>
<xsl:variable name="hitLen" select="Hit-len"/>
<xsl:for-each select="Hit_hsps/Hsp">
<xsl:value-of select="$queryDef"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$queryLen"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$hitDef"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="$hitLen"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_bit-score"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_evalue"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-from"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-to"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-from"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-to"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_query-frame"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hit-frame"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_identity"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_positive"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_gaps"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_align-len"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_qseq"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_hseq"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="Hsp_midline"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each>
</xsl:template>



</xsl:stylesheet>

Example:

xsltproc --novalid blast2csv.xsl jeter.blast.xml

Result:

No definition line    99            159.983    9.34813e-45    1    99    1    105    1    1    99    99    6    105    ATGCCCGCCCTGCGCCCCGCTCTGCT---GTGGGCGCTGCTGGCGCTCTGGCTGTGCTG---CGCGGCCCCCGCGCATGCATTGCAGTGTCGAGATGGCTATGAA    ATGCCCGCCCTGCGCCCCGCTCTGCTAAAGTGGGCGCTGCTGGCGCTCTGGCTGTGCTGAAACGCGGCCCCCGCGCATGCATTGCAGTGTCGAGATGGCTATGAA    ||||||||||||||||||||||||||   ||||||||||||||||||||||||||||||   |||||||||||||||||||||||||||||||||||||||||||
No definition line    99            66.2076    1.5844e-16    1    36    106    141    1    1    36    36    0    36    ATGCCCGCCCTGCGCCCCGCTCTGCTGTGGGCGCTG    ATGCCCGCCCTGCGCCCCGCTCTGCTGTGGGCGCTG    ||||||||||||||||||||||||||||||||||||
ADD COMMENTlink written 9.5 years ago by Pierre Lindenbaum131k
1

Can this stylesheet be modified to handle multiply queries? That is, can this stylesheet convert batch blast xml output into tabular format? I have tried and failed. -Ian McDowell

ADD REPLYlink written 8.9 years ago by User 725510

Thanks Pierre. This seems the easiest way among those suggested here. However, i met this problem when i tried to run xsltproc.

./xslt: line 1: syntax error near unexpected token newline' ./xslt: line 1:<?xml version="1.0" encoding="UTF-8"?>'

I hope you can help me out of this. Thanks a lot.

ADD REPLYlink written 9.5 years ago by Lhl730

Check that there is not any character before <?xml version.... in both XML files ( blast + xslt). You can also download the stylesheet from here.

ADD REPLYlink modified 13 months ago by RamRS30k • written 9.5 years ago by Pierre Lindenbaum131k

what is a batch XML output ? some concatenated xml files ? no, it won't work.

ADD REPLYlink written 8.9 years ago by Pierre Lindenbaum131k

First of all, thanks a lot for this script. Second, I cannot run it from my Terminal in mac: I get the following message: "failed to load external entity 'whatever.xslx" cannot parse whatever.xslx

how can I fix it? Am I making any mistake?

Thank you very much

ADD REPLYlink written 8.0 years ago by elmagodelabahia60

xsltproc cannot find your file "whatever.xslx" check the names, check the path

ADD REPLYlink written 8.0 years ago by Pierre Lindenbaum131k
1
gravatar for John
9.5 years ago by
John50
John50 wrote:

For this purpose you can open the xml Blastoutput file in speardsheet as an external data source. You can also find NOBLAST(New Options for BLAST) useful for this purpose.NOBLAST is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database.please read the complete publication here and download it from Here

ADD COMMENTlink written 9.5 years ago by John50

Hi! I am very new in this world and I do not have too much experience working on bioinformatics. I have downloaded NOBLAST but I have a question about its installation: Do I have to install BLAST on my computer prior to using it? How can I do that? Thanks a lot.

ADD REPLYlink written 8.1 years ago by elmagodelabahia60
1
gravatar for Dejian
9.5 years ago by
Dejian1.3k
United States
Dejian1.3k wrote:

Bioperl gives some specific advice to deal with this problem.

ADD COMMENTlink written 9.5 years ago by Dejian1.3k

Yes, You are right. Many thanks!

ADD REPLYlink written 9.5 years ago by Lhl730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2236 users visited in the last hour