I have a XML file contained 50,000 genes Blast result with 10 hits for each gene. I want to mine desire genes Blast result from that XML file and the output file is still in XML format. The output file should have a full Blast result in XML form of desire genes.
I tired with the Python script shared by juefish on this posts, however, I only get partial of one gene Blast XML result parsed instead a list of my desire genes. I quote juefish's Python script here to make the discussion easier.
#!/usr/bin/env python import sys import os import sets import Bio from sets import Set from Bio.Blast import NCBIXML # Usage. if len(sys.argv) < 2: print "" print "This program extracts blast results from an xml file given a list of query sequences" print "Usage: %s -list file1 -xml file2 > outfile" print "-list: list of sequence names" print "-xml: blast xml output file" print "" sys.exit() # Parse args. for i in range(len(sys.argv)): if sys.argv[i] == "-list": infile1 = sys.argv[i+1] elif sys.argv[i] == "-xml": infile2 = sys.argv[i+1] fls = [infile1,infile2] results_handle = open(fls, "r") fin1 = open(fls,"r") geneContigs = Set() #establish list of names of queries to extract from xml file for line in fin1: temp=line.lstrip('>').split() geneContigs.add(temp) fin1.close() genes2 =  marker = False header = True #cycle through xml file and printing results that match to query list for line in results_handle: line = line.rstrip('\r\n') if(header==True): if(line.count("<Iteration>")>0): header=False print "%s" % (line) continue print "%s" % (line) else: if(line.count("<Iteration>")>0): genes2= if(marker==False): genes2.append(line) if(marker==True): if(line.count("</Iteration>")<1): print "%s" % (line) elif(line.count("</Iteration>")>0): print "%s" % (line) marker=False if(line.count("<Iteration_query-def>")>0): split1 = line.split(">") split2 = split1.split("<") if(split2 in geneContigs): for i in range(len(genes2)): print "%s" % (genes2[i]) marker=True genes2=
I'm not good in Python nor BioPython, it would be great if the community could share with me your experience or suggestion.
Thanks for your time.