This is my aim. I am retrieving a list of BLAST results in FASTA format for a single sequence. What I then want to do is sort any hits that are within a certain distance of each other into another list. Currently all I can do is work out the "start" location for any given hit within its genomic context:
def startfind(complete_handle, hit_handle): from Bio import SeqIO completes = list(SeqIO.parse(complete_handle, "fasta")) hits = list(SeqIO.parse(hit_handle, "fasta")) for index, hit_record in enumerate(hits): #This is adapted from another function where I actually use the index for something I just forgot to remove it here for complete_record in completes: if hit_record.id == complete_record.id: start = complete_record.seq.find(hit_record.seq) print hit_record.id, start
What I need to be able to do is this. For every single possible pair of hits where hit_record.id is equal, I want to subtract the "start" for one from the other and then check whether the absolute value/distance is within a certain range Something that would be along the lines of:
if (hit_record.id1 == hit_record.id2 and 7000 < abs(start1-start2) && abs(start1-start2) < 8000): print [some specific file] >> hit_record1 print [some specific file] >> hit_record2
But for every possible pair of hits. How can I do this?
EDIT: I just figured it out, but if anyone can think of a more efficient way (other formats of data are welcome too) I'd prefer that.
def fullfind(complete_handle, hit_handle): from Bio import SeqIO completes = list(SeqIO.parse(complete_handle, "fasta")) hits = list(SeqIO.parse(hit_handle, "fasta")) flanks = open("flanks.fasta", "w") for hit_record1 in hits: for hit_record2 in hits: for complete_record in completes: if hit_record1.id == complete_record.id and hit_record2.id == complete_record.id): start1 = complete_record.seq.find(hit_record1.seq) start2 = complete_record.seq.find(hit_record2.seq) if (8000 < start1-start2 and start1-start2 < 9000): #Don't need absolute values because every possible pair is considered and there is always one pair of hits where the difference is positive print complete_record.id