We are trying to compare the rates of evolution between a set of 20 different proteins. We are interested in the proteins because they have non-synonymous changes between two closely related species. We want to identify which of these chagnes may be the result of positive selection. One concern we have is that these proteins may be quickly evolving in general (for a variety of possible reasons). To test this we would like to have a measure of how quickly these proteins are evolving across a wide variety of species (as wide as possible) and to compare this to a random sample of proteins.
I am not familiar with this type of analysis and imagine it is not that straightforward. As far as I am aware there is no database which summarises any type of 'rate of evolution' for a given gene. I would appeciate any advice, particularly suggestions of a workflow pipeline or available software. I imagine the best approach would be to download the amino acid or nucleotide sequences of these proteins from a variety of species and construct phylogenies for each proteins and them somehow summarise these phylogenies to compare branch length between the trees.
However it might also be useful to also consider the rate of non-synonymous to synonymous mutations, as we are interested primarily in whether the non-synonymous changes we observe are unexpected. A fast evolving protein that is only experiencing synonymous changes would lead to a very different inference than if there were frequent non-synonymous changes across the phylogeny.
A totally different approach could be to average the per base conservation score for each gene, as a summary of how constrained the sequences are across species. I wonder if this isnt a more relevant approach for our question, but it lacks the information of a visible phylogenetic tree.
Any questions, thoughts, comments or suggestions are welcome.
Thanks in advance!