this is kind of an emergency, since I have to handle my MSc internship report in one week and there is an analysis that would be a very nice addition to it.
What I need
I am using the Ensembl database, and want to know which branches in the gene trees are subject to selection. Since it is very computationally intensive (there are more than 22,000 gene trees in Ensembl), I didn't want to run PAML (codeml program) myself, but instead use existing available results (Selectome). Unfortunately, the Ensembl version used in Selectome is old and making the connection between every branch between the old and the more recent versions gives very little match.
So that was why I am asking my question:
What I think I should do
my supervisor then suggested that I use pairwise dN/dS (already available in Ensembl) and make my own soup to infer it on a per branch basis:
How I understand it, I would need to know the dN and the dS, not only the dN/dS ratios, and that seems already like a complication.
Example with a tree of 3 genes: something like getting the dN and dS between the 2 closest genes; then between the most distant and each of the 2 others. Compute dN and dS on the branch leading to the 2 closest genes by substraction.
This sounds odd to me. Is it even an approximate method? I think it's equivalent to calculating the dN/dS ratio in the inner branch by inferring the ancestral sequence by parsimony. How bad would that be compared to the likelihood computation performed by codeml during the branch-site analysis?
Thanks a lot for your insights.