Question: Using io.StringIO() with biopython Phylo parse
0
gravatar for mbio.kyle
3.1 years ago by
mbio.kyle300
United States
mbio.kyle300 wrote:

Hello,

I am working with the biopthon Phylo package. I have an outside data source (API) from which I pull down trees as newick strings. I am attempting to construct biopython phylo trees from these strings using the Phylo.parse() 

My goal is to do so without having to write the newick strings to a file (since parse takes a file handle). I am attempting to use io.StringIO as a replacement, and simply passing an instance of that to Phylo.parse(). It is not working, and returning an empty tree list. Does anyone know if this is due to biopython not supporting StringIO instances as file handles? Is there some otherway to create biopython trees from a string directly?

Here is my code

trees_file = io.StringIO() # this is the 'file handle'
for analysis in analyses:
   # get the string, I know this works
   uni_tree = unicode(analysis.taxonomy_as_newick())
   trees_file.write(uni_tree)
trees = list(Phylo.parse(trees_file, 'newick'))

At this point trees is an empty list. If I print the contents of trees_file to the console, I get the expected value (a bunch of newick trees). Any thoughts? 

Thanks!

ADD COMMENTlink modified 3.1 years ago by Eric T.2.4k • written 3.1 years ago by mbio.kyle300
3
gravatar for a.zielezinski
3.1 years ago by
a.zielezinski8.6k
a.zielezinski8.6k wrote:

Phylo.parse() will work on a list of tree strings.

trees_list = []
for analysis in analyses:
   uni_tree = unicode(analysis.taxonomy_as_newick())
   # Make sure the uni_tree string ends with '\n'.
   trees_list.append(uni_tree)
trees = list(Phylo.parse(trees_list, 'newick'))
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by a.zielezinski8.6k
0
gravatar for Eric T.
3.1 years ago by
Eric T.2.4k
San Francisco, CA
Eric T.2.4k wrote:

Since you've written to the StringIO object, treating it as a file, the "cursor" is at the end of the file when you finish the loop. Then when Phylo.parse reads the file, it's already at the end and there's nothing left to read.

Just add "trees_file.seek(0)" to bring the cursor to the start, and Phylo will read all the data you've written to it.

trees_file = io.StringIO()
for analysis in analyses:
   uni_tree = unicode(analysis.taxonomy_as_newick())
   trees_file.write(uni_tree)
trees_file.seek(0)
trees = list(Phylo.parse(trees_file, 'newick'))
ADD COMMENTlink written 3.1 years ago by Eric T.2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 854 users visited in the last hour