Question: Python 3.4, Biopython 1.65: Writing to file both subject and query IDs from parsed BLAST output
2
gravatar for tyleraelliott
3.6 years ago by
Canada
tyleraelliott50 wrote:

I've performed a nucleotide BLAST of a multi-fasta file against itself in order to find sequences which share highly similar flanking sequence, indicative of a duplication event. Given the .xml file produced from this what I need to do is determine cases where significant BLAST hits come not from the same sequence (query and subject are the exact same) but from different sequences. Essentially what I'm trying to do is parse the .xml output and get it to print details about the HSPs as well as IDs for the subject and query in each of those HSPs. I tried playing around with f.write(header.query_id) and f.write(parameters.query_id) but those just spit back an error. I suspect those need to be included in a separate for statement but right now I'm very lost.

I'm going off code based on section 7.4 of the Biopython 1.65 manual:

 

from Bio.Blast import NCBIXML
import sys

#>python screener.py inputfile.xml outputfile.txt
input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01: 
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

 

Any suggestions would be, as always, greatly appreciated. 

blast biopython python • 1.5k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by tyleraelliott50

Have you tried blast_record.query_id?

ADD REPLYlink written 3.6 years ago by RamRS19k

Just gave that a shot. The output it gives is in the form:

Query_2 or Query_593, etc.

So not very helpful

ADD REPLYlink written 3.6 years ago by tyleraelliott50
1
gravatar for tyleraelliott
3.6 years ago by
Canada
tyleraelliott50 wrote:

Ok, played around some more and solved the problem. What was needed was:

f.write(str(blast_record.query))

So the full code looks like this:

 

from Bio.Blast import NCBIXML
import sys


input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)
#screens an XML file based on the specification below and prints those results to a file
with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< 0.01: 
                    f.write('****Alignment****' + "\n")
                    f.write('sequence:'+ str(alignment.title) +"\n")
                    f.write('ID:' + str(blast_record.query_id) + "\n") #also added
                    f.write("Query Sequence: " + str(blast_record.query) + "\n")
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

 

ADD COMMENTlink written 3.6 years ago by tyleraelliott50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour