Question: Removing duplicate leaves/tips from newick tree
gravatar for ovon
2.3 years ago by
ovon20 wrote:

I'm having a hard time finding a solution to a problem with a set of phylogenetic trees. I'm getting a newick tree from an online database, and I need to pare it down to match an alignment that I created myself. The tip labels are GenBank accessions in this format: ACCESSION.1.XXXX. The accessions from my alignment are also GenBank accessions, but with just the ACCESSION portion of the above.

The simplest way to filter the tree to match my alignment is to strip off the '.1.XXXX' portion of the tree tip names, and then prune the tree to remove accessions not present in the alignment. This is easy to achieve with existing tools, bash, QIIME, etc.

The problem is that removing the last portion of the tree tip name results in tree tips with non-unique labels. I'd like to figure out how to trim the tree so that I can remove all but one of each non-unique tip label.

I'd have to establish some rules about how to choose which tips I'd like to preserve and which I'd like to delete, but for now I think I'd just prefer to keep the one with the highest confidence value. I can always adjust later if needed once I have some kind of framework to do the pruning in the first place.

ADD COMMENTlink modified 2.3 years ago by jhc2.8k • written 2.3 years ago by ovon20
gravatar for jhc
2.3 years ago by
jhc2.8k wrote:

You can take a look at the prune function of ETE3. It takes care of deleting single child nodes and preserving original branch lengths if necessary:

ADD COMMENTlink written 2.3 years ago by jhc2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1941 users visited in the last hour