Question

Are these two proteins homolog?

0

Entering edit mode

5.6 years ago

utsafar ▴ 80

I wonder when to say two proteins are homolog. In the current example I checked the sequence of these two proteins in pfam and as you see they have some shared domains but all domains are not shared. Can I say that these proteins are homolog? How can I be confident about my decision?

enter image description here

homology • 3.4k views

ADD COMMENT • link updated 5.6 years ago by Bharat Iyengar ▴ 330 • written 5.6 years ago by utsafar ▴ 80

0

Entering edit mode

How would you define the relationship between these two sequences at the moment? Would you call them homologous? How similar are these two protein sequences?

https://en.wikipedia.org/wiki/Sequence_homology

ADD REPLY • link 5.6 years ago by Sej Modha 5.3k

0

Entering edit mode

I define the relationship based on shared domains between two genes. these two proteins have 70 to 100 percent identity in shared domains and in the rest of their sequence, those are not meaningfully similar.

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

1

Entering edit mode

Sej Modha is asking about the ancestry relationship between the sequences. Are you comparing two proteins from different species, or from the same species?

ADD REPLY • link 5.6 years ago by h.mon 35k

0

Entering edit mode

the two proteins are from Arabidopsis thaliana. AT1G01040 and AT3G43920

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

0

Entering edit mode

Two proteins can be either paralogous or orthologous, not homologous. Homologous is a super term for paralogs and orthologs. When you say, homologous it is confusing.

ADD REPLY • link 5.6 years ago by cpad0112 21k

3

Entering edit mode

I'd say that's not strictly true, the proteins are/can be homologs. They do not 'share some homology', as it's an absolute term, which is where a lot of confusion arises. But whether they are paralogs or orthologs, they are both still homologs. I agree its less clear-cut, but to say that they can't be homologous is misleading IMHO.

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

Then, these two proteins are homolog or not? or maybe are homolog in some domains? :)

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

0

Entering edit mode

I am still confused about the following statement:

these two proteins are homolog or not?

It is not clear whether you'd like to state that these proteins are homologs of protein X or simply that these two proteins have certain shared regions i.e. coding for the same domain.

ADD REPLY • link 5.6 years ago by Sej Modha 5.3k

0

Entering edit mode

Sorry for poor English. I want to know that, based on the above pfam results, are these two proteins homolog of each other or not?

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

1

Entering edit mode

Based only on that Pfam result, the answer is probably, but you can't really say for certain. There are a lot of caveats. h.mon's answer below is the correct one as this has already been previously determined.

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

Looking at the screenshot of some colored boxes is not enough to decide if proteins are homologous or not. Check the genes of interest at the pATsi database.

http://cab.unina.it/athparalog/main2.html

ADD REPLY • link 5.6 years ago by h.mon 35k

0

Entering edit mode

I know about paralogy and orthology, duplication and speciation. I used the super term "homology" to avoid talking about differences between paralogs and orthologs. to be more clear, both these genes are in arabidopsis thaliana.

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

0

Entering edit mode

I think this is cross-posted

https://biology.stackexchange.com/questions/77050/are-these-two-proteins-homolog

ADD REPLY • link 5.6 years ago by Bioinformatics_NewComer ▴ 330

0

Entering edit mode

Hello utsafar!

It appears that your post has been cross-posted to another site: https://biology.stackexchange.com/questions/77050/can-two-proteins-sharing-a-few-domains-be-considered-homologous

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 5.6 years ago by GenoMax 141k

0

Entering edit mode

I believe biostars should be merged with stackexchange. Since biostars is an independent platform we cannot moderate cross-posts and migrate questions. It is certainly annoying but this problem was created by the administrators. There is no way to co-ordinate between these two communities. In fact now there is a bioinformatics stackexchange as well.

ADD REPLY • link 5.6 years ago by Bharat Iyengar ▴ 330

1

Entering edit mode

This site existed long before the Bioinformatics stack exchange. The prospect of merging has been bought up previously and never gets very far. Some of the Biostars mods are Bioinformatics SE moderators too. A happy medium might be some more moderators which have priviledges on both sites to shut cross posts down in a timely manner.

One of the reasons people like coming to Biostars, in particular, is that SE is quite a hostile place. Biostars is much more welcoming to novice users of bioinformatics tools, which off the top of my head, I'd say is close to, if not the majority of the posts we get.

In practice, I don't think we experience a significant degree of crossposting (at least not that is spotted anyway).

ADD REPLY • link 5.6 years ago by Joe 21k

0

Entering edit mode

As pointed out by jrj.healy, biostars predates the bioinformatics stackexchange (it may well predate the biology stackexchange as well). Also, since both Heng Li and I are moderators on stackexchange (the bioinfo site, not the biology site) too, we actually can moderate things on both. Obviously posts can't be migrated between independent sites, but that's usually a non-issue...just pick a given site for a particular question.

ADD REPLY • link 5.6 years ago by Devon Ryan 104k

0

Entering edit mode

Biostars and Biology SE are two different communities with probably different users. I cross-posted my question to use the knowledge of both communities. Even, I think I must note that I posted this question in both sites.

ADD REPLY • link 5.6 years ago by utsafar ▴ 80

score 2 · Answer 1 · 2018-08-29

As it turns out, the genes you are interested are indeed paralogs:

http://biosrv.cab.unina.it/athparalogs/main/index/focusE5.php?kwd1=AT1G01040

http://biosrv.cab.unina.it/athparalogs/main/index/focusE5.php?kwd1=AT3G43920

You should have stated you are looking at proteins from the same species, and possibly even the species and the proteins of interest - accurate information makes it easier to provide help. If we knew this information earlier, we could have point you to pATsi from the start, and you could check for yourself.

Now, if you are looking for a general method of identifying homologs, look at OMA / OrthoMCL papers, and references therein.

score 2 · Answer 2 · 2018-08-30

Homology means shared evolutionary ancestry. Sequence similarity is often used as a proxy for homology but inferences should be made with care.

The similarity between two genes/proteins should not just be good but has to be statistically significant (metrics like E-value) for the two genes/proteins to be considered homologous.

INFERRING HOMOLOGY FROM SIMILARITY

The concept of homology – common evolutionary ancestry – is central to computational analyses of protein and DNA sequences, but the link between similarity and homology is often misunderstood. We infer homology when two sequences or structures share more similarity than would be expected by chance; when excess similarity is observed, the simplest explanation for that excess is that the two sequences did not arise independently, they arose from a common ancestor. Common ancestry explains excess similarity (other explanations require similar structures to arise independently); thus excess similarity implies common ancestry.

However, homologous sequences do not always share significant sequence similarity; there are thousands of homologous protein alignments that are not significant, but are clearly homologous based on statistically significant structural similarity or strong sequence similarity to an intermediate sequence. Thus, when a similarity search finds a statistically significant match, we can confidently infer that the two sequences are homologous; but if no statistically significant match is found in a database, we cannot be certain that no homologs are present.

Pearson, 2013

Members of a protein family are descendants of a common ancestor and are hence homologous. However, in the course of evolution they would have acquired new domains or reshuffled their domains such that their sequences are no longer similar. Proteins that have full length sequence similarity are called homeomorphic (Wu et al., 2004). Therefore, members of a protein family may be homologous but not homeomorphic. However, homeomorphic proteins can evolve independently and therefore may not be considered homologous.

Identifying homologous proteins is, therefore, not a simple task. Machine learning algorithms are used for better identification of homologous proteins. Some of these algorithms are mentioned in the linked papers.

In general, global similarity, rather than local similarity should be considered for identifying homeomorphs. See https://biology.stackexchange.com/q/11263/3340

I don't know the proteins in your example but if they are from same protein family, then they are homologous. As someone else pointed out, these genes are indeed paralogs.