Phylogenetic distance betwee species
1
0
Entering edit mode
5.4 years ago
Jautis ▴ 300

Hi, I have a phylogenetic tree derived from genotype information from multiple individuals per species. It looks something like this:

(((Spec1.77,(Spec1:31,Spec1:31):1.77):0,(Spec2:4.17,(Spec2.14,(Spec2:27.24,Spec2:27.24):3.14):4.17):0):0,(Spec3:1.8,(Spec3:0.4,(Spec3:0.7,(Spec3:18.3,Spec3.1:18.3):0.7):0.4):1.8):0)

How would I simplify this tree to only capture the distance in between species? I.e., produce a tree like (Spec1:x1,Spec2,x2),Spec3:x3?

Is there an efficient way to do this for a large tree?

phylogenetics newick trees • 1.7k views
2
Entering edit mode
5.4 years ago
jhc ★ 2.9k

Your tree resembles a regular gene tree with duplications. However, it's not clear to me if: 1) the duplicated items are always like in your example (all branches from the same species are grouped together) or 2) you could also have complex patterns like ((spAseq1, spBseq1), (spAseq2, spBseq2)).

If 1), you just need to colapse the species-specific-nodes into a single branch, choosing a method for summarising the distances therein (i.e. max branch length, average, sum, etc). You could easily do this in a programatic way using any phyloinformatics toolkit. I use ETE, but it would also be possible with biopython (Phylo), bioperl Bio:Phylo, etc

if 2), you would need to decompose your gene tree in all possible species subtrees. The TreeKO methodology is good for this, and I recently implemented it into ETE so it can be also used programatically. In brief, you will need to decompose your tree into multiple subtrees using the tree.get_speciation_trees() function. Then, you need to somehow make a consensus out of the resulting subtrees. For the consensus, you could just compute a distance matrix averaging the all-against-all distances observed among the species nodes, or build a consensus tree (check biopython for this).

0
Entering edit mode

Hi, sorry for the delayed response.

The duplicated items are individuals, not multiple genes from the species. So, for example, I have 3 individuals from species 1 which cluster together and three individuals from species 2 that cluster together, and what I want to know is the distance between species 1 and species 2. I think this would be straightforward if branch lengths of individuals within a species were the same, but they are not.