12 weeks ago

Hello everyone,

I am trying to infer phylogenetic relationships between a set of approximately 60 proteins. They are all putative antibacterial proteins, but their sequences (and predicted functions) are different. Within this set, there are multiple subgroups: for example, there are tRNases (which cleave tRNA molecules), rRNAses (which cleave rRNA molecule), DNAses, an so on. Proteins within each subgroup have high homology with each other (>90%) but the homology between two proteins from different some groups is very low (5-10%). There are also proteins (~15-20) that are different to all the others and do not fall in a subgroup.

My current workflow to infer a phylogenetic relationship has been to align all 60 proteins using MUSCLE, build a tree from the MUSCLE alignment in PhyML (default settings), and then visualize the tree. I observed clear clustering for proteins with the same predicted function and this has been helpful.

However, I am worried that aligning proteins with such low homology using MUSCLE might not be accurate. I guess I am mostly worried about the relationships that are inferred between the distantly related proteins. Do you think that this is an ok method? If not is there anything else that you would recommend?

Thank you in advance, Richard

