Using io.StringIO() with biopython Phylo parse
2
0
Entering edit mode
6.8 years ago
mbio.kyle ▴ 380

Hello,

I am working with the biopthon Phylo package. I have an outside data source (API) from which I pull down trees as newick strings. I am attempting to construct biopython phylo trees from these strings using the Phylo.parse()

My goal is to do so without having to write the newick strings to a file (since parse takes a file handle). I am attempting to use io.StringIO as a replacement, and simply passing an instance of that toPhylo.parse(). It is not working, and returning an empty tree list. Does anyone know if this is due to biopython not supporting StringIO instances as file handles? Is there some otherway to create biopython trees from a string directly?

Here is my code

trees_file = io.StringIO() # this is the 'file handle'
for analysis in analyses:
# get the string, I know this works
uni_tree = unicode(analysis.taxonomy_as_newick())
trees_file.write(uni_tree)
trees = list(Phylo.parse(trees_file, 'newick'))


At this point trees is an empty list. If I print the contents of trees_file to the console, I get the expected value (a bunch of newick trees). Any thoughts?

Thanks!

python biopython phylogenetics tree newick • 2.5k views
3
Entering edit mode
6.8 years ago

Phylo.parse() will work on a list of tree strings.

trees_list = []
for analysis in analyses:
uni_tree = unicode(analysis.taxonomy_as_newick())
# Make sure the uni_tree string ends with '\n'.
trees_list.append(uni_tree)
trees = list(Phylo.parse(trees_list, 'newick'))

0
Entering edit mode
6.8 years ago
Eric T. ★ 2.7k

Since you've written to the StringIO object, treating it as a file, the "cursor"is at the end of the file when you finish the loop. Then when Phylo.parse reads the file, it's already at the end and there's nothing left to read.

Just add trees_file.seek(0) to bring the cursor to the start, and Phylo will read all the data you've written to it.

trees_file = io.StringIO()
for analysis in analyses:
uni_tree = unicode(analysis.taxonomy_as_newick())
trees_file.write(uni_tree)
trees_file.seek(0)
trees = list(Phylo.parse(trees_file, 'newick'))