Question: Python 3.4, Biopython 1.65:Screening out HSPs in .xml BLAST output where the subject and query are the same
0
gravatar for tyleraelliott
4.0 years ago by
Canada
tyleraelliott50 wrote:

I have BLAST output in .xml format that I am parsing for certain features and writing them to a file. Here is the code:

from Bio.Blast import NCBIXML

import sys


input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3]) 
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            for hsp in alignment.hsps:
                if hsp.expect< e_value_thresh:
                    f.write('****Alignment****' + "\n")
                    f.write('Subject Sequence:'+ str(alignment.title) +"\n")
                    f.write("Query Sequence:  " + str(blast_record.query) + "\n") #also added
                    f.write('length:'+ str(alignment.length) +"\n")
                    f.write('e value:'+ str(hsp.expect) +"\n")
                    f.write(hsp.query[0:75] + '...'+"\n")
                    f.write(hsp.match[0:75] + '...'+"\n")
                    f.write(hsp.sbjct[0:75] + '...'+"\n")

 

 

What I need to do is also screen out HSPs where the subject and query are the same sequence, since my BLAST query and database files are identical. I thought of changing:

if hsp.expect< e_value_thresh:

to

if hsp.expect< e_value_thresh and str(alignment.title) is not str(blast_record.query):

but this generated an error instead. 

I would also like to write where the HSP match begins for the query and subject but

f.write('Subject Start:' + str(hsp.subjct_start) + "\n")

just gave an error saying that subjct_start was not an attribute of hsp

I'm still very new to Biopython and Python in general so I'm sure there is something very simple that I'm missing here, and I apologize if that's the case

blast biopython python • 2.0k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 4.0 years ago by tyleraelliott50

Please read through the documentation here: http://biopython.org/DIST/docs/api/Bio.Blast.Record-module.html This will give you all modules, classes and methods available, and their context.

ADD REPLYlink written 4.0 years ago by RamRS21k

Right, that fixed my Start and End issue(dumb spelling mistake on my part, thanks!). I was viewing this document before and it did solve some other problems I was having but it wasn't able to answer how to screen out HSPs with the exact same query and subject. 

ADD REPLYlink written 4.0 years ago by tyleraelliott50

Maybe compare gi from the names?

ADD REPLYlink written 4.0 years ago by RamRS21k
1
gravatar for tyleraelliott
3.9 years ago by
Canada
tyleraelliott50 wrote:

I have solved it for anyone who was interested. It was a stupid mistake in how I wrote the inequality operator:

 

from Bio.Blast import NCBIXML
import sys


input_file=str(sys.argv[1])
output_file=str(sys.argv[2])
e_value_thresh=float(sys.argv[3]) 
result_handle=open(input_file)
blast_records=NCBIXML.parse(result_handle)

with open(output_file, "w") as f:
    for blast_record in blast_records:
        for alignment in blast_record.alignments:
            if str(alignment.hit_id) != str(blast_record.query): #this line eliminates HSPs with the same query and subject
                for hsp in alignment.hsps:
                    if (hsp.expect< e_value_thresh): 
                        f.write('****Alignment****' + "\n")
                        f.write('Subject hit id: ' + str(alignment.hit_id) + "\n") 
                        f.write('Subject Start:' + str(hsp.sbjct_start) + "\n") 
                        f.write('Subject End:' +str(hsp.sbjct_end) + "\n") 
                        f.write("Query Sequence:  " + str(blast_record.query) + "\n") 
                        f.write("Query Start:" + str(hsp.query_start) + "\n")
                        f.write("Query End:" + str(hsp.query_end) + "\n")
                        f.write('length:'+ str(alignment.length) +"\n")
                        f.write('e value:'+ str(hsp.expect) +"\n")
                        f.write(hsp.query[0:75] + '...'+"\n")
                        f.write(hsp.match[0:75] + '...'+"\n")
                        f.write(hsp.sbjct[0:75] + '...'+"\n")

ADD COMMENTlink written 3.9 years ago by tyleraelliott50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour