10 months ago by
Homology means shared evolutionary ancestry. Sequence similarity is often used as a proxy for homology but inferences should be made with care.
The similarity between two genes/proteins should not just be good but has to be statistically significant (metrics like E-value) for the two genes/proteins to be considered homologous.
INFERRING HOMOLOGY FROM SIMILARITY
The concept of homology – common evolutionary ancestry – is central to
computational analyses of protein and DNA sequences, but the link
between similarity and homology is often misunderstood. We infer
homology when two sequences or structures share more similarity than
would be expected by chance; when excess similarity is observed, the
simplest explanation for that excess is that the two sequences did not
arise independently, they arose from a common ancestor. Common
ancestry explains excess similarity (other explanations require
similar structures to arise independently); thus excess similarity
implies common ancestry.
However, homologous sequences do not always share significant sequence
similarity; there are thousands of homologous protein alignments that
are not significant, but are clearly homologous based on statistically
significant structural similarity or strong sequence similarity to an
intermediate sequence. Thus, when a similarity search finds a
statistically significant match, we can confidently infer that the two
sequences are homologous; but if no statistically significant match is
found in a database, we cannot be certain that no homologs are
Members of a protein family are descendants of a common ancestor and are hence homologous. However, in the course of evolution they would have acquired new domains or reshuffled their domains such that their sequences are no longer similar. Proteins that have full length sequence similarity are called homeomorphic (Wu et al., 2004). Therefore, members of a protein family may be homologous but not homeomorphic. However, homeomorphic proteins can evolve independently and therefore may not be considered homologous.
Identifying homologous proteins is, therefore, not a simple task. Machine learning algorithms are used for better identification of homologous proteins. Some of these algorithms are mentioned in the linked papers.
In general, global similarity, rather than local similarity should be considered for identifying homeomorphs. See https://biology.stackexchange.com/q/11263/3340
I don't know the proteins in your example but if they are from same protein family, then they are homologous. As someone else pointed out, these genes are indeed paralogs.