I have gene trees, with molecular distances. In these gene trees, I have speciation nodes, whose absolute age is known. I also have gene duplication nodes, whose absolute age is unknown. These molecular trees are highly not ultrametric (meaning there isn't the same distance between leaf and root, depending on the leaf).
I would like to infer an approximation of the age of duplication nodes. So I started with a crude method that looks like the UPGMA (actually, it's closer to WPGMA but the idea is similar): I start from a speciation, and I climb down the tree: each time I find a duplication node, I average the descendant branch lengths. Then I can scale this new ultrametric distance value using absolute ages (of speciation).
However, there are also missing speciations in these trees... so when I find a duplication node, it is possible that one descendant node is a speciation at time t1, and the other descendant would be another speciation at time t2 ≠ t1.
I will make a little sketch to illustrate this:
if there were all speciation nodes, I would try to reconstruct an ultrametric tree that looks like this:
|------ S1 |----| | |------ S1 ----| |----------- S1
but because there are missing speciation nodes, we want to reconstruct a tree that should be like this:
|------ S1 |----| | |------ S1 ----| |------------------- S2
Is there a simple adaptation I can apply to my method to take this into account?
Now I know there are sophisticated methods to infer divergence times from molecular data, but some of them are parametric methods (with varying rates of evolution, and likelihood inference) that seem too computationally intensive for the number of trees I have to process. I am currently reading Sanderson 1997 (a nonparametric method) and Sanderson 2002 (a semi-parametric method) to see if I could apply these, but right now I'd prefer to start simple and fast. However I am happy if you suggest me state-of-the-art methods or reviews on multiple methods :)