Question: Can Bio.Blast.NCBIXML somehow parse the stdout from NcbipsiblastCommandline so that no .xml file is created?
0
gravatar for russianconcussion
4.6 years ago by
United States
russianconcussion20 wrote:

Hi, I would like parse the XML output of a local psiblast (NcbipsiblastCommandline wrapper) by putting the stdout from the wrapper into a python string variable and then using NCBIXML.parse to parse the contents of the string variable.  Is there any way to do this without getting this error message (and to avoid writing a temporary file):

record = next(psiblast_records)
  File "/usr/local/lib/python2.7/dist-packages/Bio/Blast/NCBIXML.py", line 617, in parse
    text = handle.read(BLOCK)
AttributeError: 'str' object has no attribute 'read'

 

Code:

#!/usr/bin/env python

#load modules
from sys import argv

from Bio.Blast.Applications import NcbipsiblastCommandline as psiblast

from Bio.Blast import NCBIXML
from Bio import Entrez

from Bio.Phylo.Applications import FastTreeCommandline as fasttree

#read arguments from command line: 1)amino-acid fasta to build psiblast profile, 2)maximum number of threads for each process
ref_fasta = argv[1]
threads = argv[2]

#use three iterations of psiblast to generate sequence diversity
blast = psiblast(query = ref_fasta, db = 'nr', outfmt = 5, num_alignments = 5000, num_threads = threads)

psiblast_out = blast()[0]


#parse the XML output
psiblast_records = NCBIXML.parse(psiblast_out)

record = next(psiblast_records)

ncbixml blast biopython • 3.5k views
ADD COMMENTlink modified 3.1 years ago by Peter5.8k • written 4.6 years ago by russianconcussion20
5
gravatar for lelle
4.6 years ago by
lelle780
Berlin
lelle780 wrote:

You can use the StringIO module to make an object that behaves like a file handle and can be passed to the parse function.

ADD COMMENTlink written 4.6 years ago by lelle780

Worked perfectly.  After reading your answer and searching for SeqIO in the BioPython tutorial, I found many, many references to the module and how to use it with lots of the wrappers.  Code for any other beginners who stumble on this post:

from cStringIO import StringIO

...

blast = psiblast(query = ref_fasta, db = 'nr', outfmt = 5, num_alignments = 5000, num_threads = threads)

psiblast_stdout = blast()[0]

#parse the XML output
psiblast_xml = StringIO(psiblast_stdout)

psiblast_records = NCBIXML.parse(psiblast_xml)

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by russianconcussion20
1
gravatar for Peter
3.1 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

First use out="-" (this is the default) when building the BLAST+ command line. Rather than making a file named hyphen will write the output to stdout. Second, you will need to call the command line string from Biopython using the subprocess module. There is a related example using MUSCLE in the Biopython Tutorial - search for "MUSCLE using stdout" in http://biopython.org/DIST/docs/tutorial/Tutorial.html

The answer from Lelle using StringIO would also work, and would be quite simple and reliable BUT this will load the entire XML file into memory as a string. That can be a problem for some datasets.

ADD COMMENTlink written 3.1 years ago by Peter5.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2180 users visited in the last hour