I am trying to parse a blast result produced using outfmt 6 option.
I have made several tries, with iterators, without iterators... But each time it fails to parse my file.
Here some code that I try :
parser = argparse.ArgumentParser() parser.add_argument("blast_file", help="The path of the file containing blast result in xml format") args = parser.parse_args() results = open(args.blast_file, "r") blast_parser = NCBIStandalone.BlastParser() blast_records = blast_parser.parse(results) for blast_record in blast_records: E_VALUE_THRESH = 0.0004 for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect < E_VALUE_THRESH: print('****Alignment****') print('sequence:', alignment.title) print('length:', alignment.length) print('e value:', hsp.expect) if len(hsp.query) > 75: dots = '...' else: dots = '' print(hsp.query[0:75] + dots) print(hsp.match[0:75] + dots) print(hsp.sbjct[0:75] + dots)
But then, it showed this error :
python parse_last_hit.py /media/loutre/SUZUKII/assembly/duplication_removal/2017/Blast/Contig_37_orf.txt /usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py:57: BiopythonDeprecationWarning: This module has been deprecated. Consider Bio.SearchIO for parsing BLAST output instead. "parsing BLAST output instead.", BiopythonDeprecationWarning) /usr/lib/python2.7/dist-packages/Bio/ParserSupport.py:29: BiopythonDeprecationWarning: Bio.ParserSupport is now deprecated will be removed in a future release of Biopython. "future release of Biopython.", BiopythonDeprecationWarning) Traceback (most recent call last): File "parse_last_hit.py", line 14, in <module> blast_records = blast_parser.parse(results) File "/usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py", line 836, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.7/dist-packages/Bio/Blast/NCBIStandalone.py", line 118, in feed read_and_call_until(uhandle, consumer.noevent, contains='BLAST') File "/usr/lib/python2.7/dist-packages/Bio/ParserSupport.py", line 320, in read_and_call_until line = safe_readline(uhandle) File "/usr/lib/python2.7/dist-packages/Bio/ParserSupport.py", line 400, in safe_readline raise ValueError("Unexpected end of stream.") ValueError: Unexpected end of stream.
By googling it, I found that it may be a problem of the blast format issue (Problems With Biopython When Running The Ncbistandalone.Py Program)
I really don't want to go through XML, I can't allow it because I have a lot of Blast with huge sequences to do. Producing xml takes too much time and too much storage.
Does anyone know a way using Biopython to parse through blast result in tabular format ? Thanks a lot for your answers !