Question: How Do You Extract The Query Title From A Blast Xml Output Using Biopython
2
gravatar for Farhat
8.3 years ago by
Farhat2.9k
Pune, India
Farhat2.9k wrote:

I would like to find the alignment title along with the query title and the expect value in a BLAST XML output file with many query sequences. I can get the alignment title and expect value but the query title is eluding me. How do I extract that?

blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()

for blast_record in blast_records:
    for alignment in blast_record.alignments:
        for hsp in alignment.hsps:
            print alignment.title, hsp.expect, blast_record.header.query
biopython blast • 6.6k views
ADD COMMENTlink written 8.3 years ago by Farhat2.9k

Can you give the cloud link of your output file?

ADD REPLYlink written 8.3 years ago by Thaman3.2k

Thaman: that won't be of much help since any result file should have its query name in the same place.

ADD REPLYlink written 8.3 years ago by Michael Schubert6.9k
7
gravatar for David W
8.3 years ago by
David W4.7k
New Zealand
David W4.7k wrote:

Is blast_record.query what you're looking for?

ADD COMMENTlink written 8.3 years ago by David W4.7k

Yes, that does the job.

ADD REPLYlink written 8.3 years ago by Farhat2.9k
1
gravatar for Pierre Lindenbaum
8.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum125k wrote:

I'm not a python or a BioPython guy, but if your document is large, it could be a bad idea to upload it in memory. Here is a event-bases (SAX) solution:

import sys
import xml
import xml.sax
from  cStringIO import StringIO
from xml.sax.handler import ContentHandler,DTDHandler,EntityResolver
from xml.sax.xmlreader import InputSource

class BlastHandler(ContentHandler,EntityResolver):
    def __init__(self):
        self.content=None
        self.hitdef=None
        self.evalue=None
        self.query=None
    def startElement(self,name,attrs):
        if(name=="Hit_def" or name=="Hsp_evalue" or name=="BlastOutput_query-def"):
            self.content=""
    def endElement(self,name):
        if(name=="Hsp_evalue"):
            self.evalue=self.content
            print self.hitdef,self.evalue,self.query
            self.evalue=None
        elif(name=="Hit_def"):
            self.hitdef=self.content
        elif(name=="BlastOutput_query-def"):
            self.query=self.content
        self.content=None
    def characters(self,chars):
        if(self.content!=None):
            self.content+=chars
    def notationDecl(self, name, publicId, systemId):
        return None
    def unparsedEntityDecl(self, name, publicId, systemId, ndata):
        return None
    def resolveEntity(self, publicId, systemId):
        input = InputSource()
        input.setByteStream(StringIO(""))
        return input

if __name__=='__main__':
    handler=BlastHandler()
    parser=xml.sax.make_parser()
    parser.setContentHandler(handler)
    parser.setEntityResolver(handler)
    parser.parse(open(sys.argv[1]))
ADD COMMENTlink written 8.3 years ago by Pierre Lindenbaum125k
2

The Biopython parser should only deal with the hits for one query at a time - so big multi-query BLAST XML files are not such a problem.

ADD REPLYlink written 8.3 years ago by Peter5.8k

I would suggest cElementTree Api instead of SAX

ADD REPLYlink written 8.3 years ago by Thaman3.2k

Out of interest why? Other bits (newer) of Biopython do use ElementTree, but the BLAST XML parser (which is older) went for SAX.

ADD REPLYlink written 7.7 years ago by Peter5.8k

Biopython parser does deal with only 1 query at a time so memory is not such an issue.

ADD REPLYlink written 8.3 years ago by Farhat2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 727 users visited in the last hour