Question

Phylogenetic distance betwee species

0

Entering edit mode

8.4 years ago

Jautis ▴ 530

Hi, I have a phylogenetic tree derived from genotype information from multiple individuals per species. It looks something like this:

(((Spec1.77,(Spec1:31,Spec1:31):1.77):0,(Spec2:4.17,(Spec2.14,(Spec2:27.24,Spec2:27.24):3.14):4.17):0):0,(Spec3:1.8,(Spec3:0.4,(Spec3:0.7,(Spec3:18.3,Spec3.1:18.3):0.7):0.4):1.8):0)

How would I simplify this tree to only capture the distance in between species? i.e., produce a tree like (Spec1:x1,Spec2,x2),Spec3:x3?

Is there an efficient way to do this for a large tree?

phylogenetics newick trees • 2.6k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by Jautis ▴ 530

Ram · Answer 1 · 2015-11-17

Your tree resembles a regular gene tree with duplications. However, it's not clear to me if: 1) the duplicated items are always like in your example (all branches from the same species are grouped together) or 2) you could also have complex patterns like ((spAseq1, spBseq1), (spAseq2, spBseq2)).

If 1), you just need to colapse the species-specific-nodes into a single branch, choosing a method for summarising the distances therein (i.e. max branch length, average, sum, etc). You could easily do this in a programatic way using any phyloinformatics toolkit. I use ETE, but it would also be possible with biopython (Phylo), bioperl Bio:Phylo, etc

if 2), you would need to decompose your gene tree in all possible species subtrees. The TreeKO methodology is good for this, and I recently implemented it into ETE so it can be also used programatically. In brief, you will need to decompose your tree into multiple subtrees using the tree.get_speciation_trees() function. Then, you need to somehow make a consensus out of the resulting subtrees. For the consensus, you could just compute a distance matrix averaging the all-against-all distances observed among the species nodes, or build a consensus tree (check biopython for this).