Question: Cog Paralogous And Orthologous Classes
gravatar for Zhe Liu
7.6 years ago by
Zhe Liu0
Zhe Liu0 wrote:

Dear all,

I want to use COG to find the orthologous genes in all bacteria, however I found that COG (EggNOG) not only contains orthologous genes across organisms, but also contains paralogous genes.

Said on the website of COG database and here:

Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

As for paralogous genes, their function might divert after gene duplication. Thus the gene functions under a specific COG ID might differ from each other. Thus I wonder if there is any indication that certain COG ID is constructed from paralogous genes or orthologous genes.

If we could not recognized paralogous (in-paralog) genes from orthologous genes, is there any other ways to do this job?

Thank you very much and I am looking forward your reply!

orthologues • 2.9k views
ADD COMMENTlink modified 7.6 years ago by zhliu.tju0 • written 7.6 years ago by Zhe Liu0
gravatar for Lars Juhl Jensen
7.6 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

You need to realize that there is by definition no way to avoid getting paralogs in the problem you state.

Imagine that you have an ancestral species (A) in which you have one gene (A1) from a some family. Through a speciation event A becomes the species B and C, which each still have one copy of the gene each (B1 and C1). Next you have a gene duplication event in C so that you now have two copies of the gene (C1 and C2), which subsequently over long times diverge in function. It is important to realize here that C1 and C2 are completely equal - it may just as well be C2 that has the ancestral function as it may be C1. Through yet another speciation event, species C becomes D and E, which each have two paralogous genes D1/D2 and E1/E2.

You now find yourself in the situation that you have three extant organisms B and D and E, with the genes B1, D1, D2, E1, and E2. If you trace their origin, they all derive from A1 through a speciation event. D1, D2, E1, and E2 are thus all orthologs of B1, despite D1 and D2 being paralogs.

There is no way to "fix" this, because it is not an error. It is the reality. The gene A1 does not have a one-to-one ortholog in D or E. The gene E2 is both an ortholog of B1 and a paralog of D1! If you were to remove it to avoid paralogs, you would be removing one of the correct orthologs. And there is no guarantee that E1 is the one with ancestral function whereas E2 has taken on some different function. It could just as well be the other way around. The ancestral function may even have been divided among the two copies that each do part of it (this is known as subfunctionalization).

Since you want to have orthologs across all of bacteria, you need orthologous groups defined with respect to the last common ancestor of all bacteria. All subsequent gene duplications will lead to genes that - like in the example above - are at the same time paralogs of some genes and orthologs of others. You simply have to accept to live with there being paralogs in your set. It is not a flaw of the COG database. Paralogs are an unavoidable logical consequence of dealing with orthology across multiple species.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Lars Juhl Jensen11k

Thank you very much for your soon reply, I'd better find other means to solve this problem and when I found a good way to deal with it, I will post it here.

ADD REPLYlink written 7.6 years ago by Zhe Liu0
gravatar for Manu Prestat
7.6 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

The tool FamFetch that allows one to mine the Hogenome db using a specified "evolution layout" is definitely the tool you want.

ADD COMMENTlink written 7.6 years ago by Manu Prestat3.9k

Thank you very much, I will have a look at this tool.

ADD REPLYlink written 7.6 years ago by Zhe Liu0
gravatar for zhliu.tju
7.6 years ago by
zhliu.tju0 wrote:

I have discovered that for eukaryote organism, ENSEMBL is probably the best way to find orthologous genes. There are some genes with one-to-one orthologous label which is usually regarded as orthologous genes.

As for prokaryote, OMA ( is a database designed to provide one-to-one orthologous genes across different species.

ADD COMMENTlink written 7.6 years ago by zhliu.tju0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour