I have a collection of protein orthologous groups output by orthoMCL. I would like to get some idea of the relationship between these OGs. I think one way to approach this is to build a consensus sequence for each OG, then build a phylogenetic tree from an MSA of these consensuses (consensii?).
However, I'm not sure this will actually mean anything. As I understand, classical trees are constructed based on orthology between sequences in tree. Orthologs are proteins separated by speciation rather than gene duplication, so differences in sequence between 2 orthologs can be assumed to represent the impact of speciation.
If the proteins/protein consensuses are not orthologous (which would presumably be the case, if they are derived from separate OGs), would a tree such as I describe (or it's distance matrix) illustrate the relative number of non-orthologous gene duplications between the OG consensuses (so would give an idea of when the two proteins separated from some evolutionary sequence, and then started generating orthologs as their host organisms speciated)? Or would any potential stretches of alignment just be random noise?
I am working on the assumption that all proteins will share some evolutionary relationship, even if it is very faint and stretches back to the first protein in the primordial soup - but that could be wrong! I am also very new to phylogenetics, so may have butchered some of the theory :P