Question

Removing duplicate leaves/tips from newick tree

1

Entering edit mode

7.4 years ago

ovon ▴ 20

I'm having a hard time finding a solution to a problem with a set of phylogenetic trees. I'm getting a newick tree from an online database, and I need to pare it down to match an alignment that I created myself. The tip labels are GenBank accessions in this format: ACCESSION.1.XXXX. The accessions from my alignment are also GenBank accessions, but with just the ACCESSION portion of the above.

The simplest way to filter the tree to match my alignment is to strip off the '.1.XXXX' portion of the tree tip names, and then prune the tree to remove accessions not present in the alignment. This is easy to achieve with existing tools, bash, QIIME, etc.

The problem is that removing the last portion of the tree tip name results in tree tips with non-unique labels. I'd like to figure out how to trim the tree so that I can remove all but one of each non-unique tip label.

I'd have to establish some rules about how to choose which tips I'd like to preserve and which I'd like to delete, but for now I think I'd just prefer to keep the one with the highest confidence value. I can always adjust later if needed once I have some kind of framework to do the pruning in the first place.

newick tree phylogeny pruning trees • 4.9k views

ADD COMMENT • link updated 7.4 years ago by jhc ★ 3.0k • written 7.4 years ago by ovon ▴ 20

score 0 · Answer 1 · 2016-11-26

You can take a look at the prune function of ETE3. It takes care of deleting single child nodes and preserving original branch lengths if necessary: http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#pruning-trees http://etetoolkit.org/docs/latest/reference/reference_tree.html#ete3.TreeNode.prune