Question

Parsing Uniprot Blast Output

0

Entering edit mode

10.6 years ago

jcastrofigueroa ▴ 140

Dear all: Does any of you know how to parse a BLAST output in EBI XML format? - I performed many local Blasts (blastp) using Uniref90 as database, now what I have is a xml file with all the results that I'd like to parse, but I don't know how. Can anyone please give me a clue about how to do this in Biopython?

Thanks a lot

uniprot biopython • 3.7k views

ADD COMMENT • link updated 10.6 years ago by Hamish ★ 3.2k • written 10.6 years ago by jcastrofigueroa ▴ 140

1

Entering edit mode

did you try xslt instead of python ? BLAST stylesheet ; Extract 100 downstream sequence of the aligned sequence of blast. ; How to capture the Blast result in a string variable to save in a database using BioPerl ; tools parsing NCBI blast -m 7 xml output format? ; Taking only aligned sequences in a BLAST

ADD REPLY • link 10.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

No haven't heard about that before. I'll check it out. Thanks

ADD REPLY • link 10.6 years ago by jcastrofigueroa ▴ 140

score 2 · Answer 1 · 2013-10-01

Assuming you mean the EMBL-EBI Application XML, as produced by the EMBL-EBI Web Services for sequence similarity searches, then the XML Schema for the format (as referenced in the results) can be found at:

http://www.ebi.ac.uk/Tools/common/schema/ApplicationResult.xsd

This includes documentation describing the contents of the various elements, and can be used with tools supporting parser generation from an XML schema.

As far as I am aware BioPython does not contain support for this format (it does support the NCBI BLAST XML and the plain text output formats, see http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc91), however Python contains support for handling XML documents (see 19. Structured Markup Processing Tools in the Python documentation), and there are many packages that provide additional capabilities (see https://wiki.python.org/moin/PythonXml). So developing your own code to extract the required information should not be too difficult.

Alternatively as Pierre Lindenbaum suggests you could develop an XML transformation using XSLT to convert the XML into something simpler which just contains the information you need (you might find the XSLT Tutorial and the XPath Tutorial from W3Schools helpful if you want to try this). Unfortunately the examples Pierre cites are based on the NCBI BLAST XML so you will not be able to use them directly, but they do illustrate the concept and may help when trying to work out some of the trickier aspects of the conversion.

Note: the EMBL-EBI Web Services for NCBI BLAST, PSI-BLAST and WU-BLAST provide the option of producing NCBI BLAST XML instead of EMBL-EBI Application XML, see the alignment format parameter for details.