Lately, I am annotating some proteins using blast, these proteins all belong to the same protein big family, and different protein subfamilies. So my goal is to allocate these proteins to the corresponding protein subfamilies. I use blast to do the work: if a protein has top hit in one certain subfamily, then this protein will be annotated as a member of the subfamily. After that, I build phylogeny using fasttree software to see if proteins I annotated to be in the same subfamily cluster together. I think I could annotate proteins correctly using methods above, but when I submitted my paper to a magazine, a reviewer pointed out that my annotation process leaded to some wrong conclusions. He said that although protein subfamily A were more similar to B than C in blast alignment, the previous experiment has proved that A and B had different function, while A could be replaced by C. So A and C are supposed to be homologous, while A and B are not. so, my questions are : 1. Why is that A has higher sequence similarity to B than C but A and B function differently, while A and C have similar function? 2. If A, B and C belong to the same protein big family and different protein subfamilies, A, B function differently, while A, C have similar functions, does that mean A and C have the same last common ancestor while A and B don't? 3. Is it true that using motifs to annotate protein is more accurate than using blast? These questions have been confusing me for some time, I'll appreciate it if someone can help me with that !
I think you should build a phylogenetic tree for the whole family. This would clarify ancestry and subfamilies from the phylogenetic point of view. Instead of blast, start with a multiple sequence alignment. You can use motifs to annotate proteins and group them into families (like e.g. PFAM does) but this would not give you information about the phylogeny.