Question: Evolutionary Distance Among Species For Orthology Analysis
gravatar for Damian Kao
6.8 years ago by
Damian Kao15k
Damian Kao15k wrote:

I am using orthomcl on 4-5 species (complete proteomes) to try to find groups of orthologous genes. Will the evolutionary distances among these species affect my results? If species A and species B are in the same genus and all other species are in separate phyla, will orthomcl be biased towards putting A and B exclusively in an orthologous group?

From my results, it seems like I am getting proportionally more exclusive A-B groups, which makes sense since they are closer together. But when I blast some of the genes in the A-B group, I am getting decent hits to the other species I used in orthomcl.

The algorithm doesn't seem to be described very well in their paper and the source code for the orthology finding is basically a set of messy SQL calls. There does seem to be some kind of a weighing procedure to normalize the blast scores. Does anyone have any thoughts or suggestion for alternative method/software?

orthomcl • 2.7k views
ADD COMMENTlink modified 6.8 years ago by qiyunzhu420 • written 6.8 years ago by Damian Kao15k

I think orthomcl is essentially clustering based on similarities calculated from all-to-all blast results. Could you try some phylogeny-based methods? I image that would give you some A and B lineage-specific duplications.

ADD REPLYlink written 6.8 years ago by Vitis2.3k
gravatar for Asaf
6.8 years ago by
Asaf7.0k wrote:

I don't know if the software is available but maybe you find SYNERGY useful. Another algorithm for finding clusters of orthologous proteins is oma-browser where you can find precomputed clusters.

ADD COMMENTlink written 6.8 years ago by Asaf7.0k
gravatar for 14134125465346445
6.8 years ago by
United Kingdom
141341254653464453.5k wrote:

The original algorithm for Orthomcl didn't do any adhoc weighing of the species in the sets, although this might have changed. There is an alternative to orthomcl that does take into account ingroup and outgroup species, which is to use hcluster_sg to do the clustering for you blast scores:

Click on the Download GNU tarball at the bottom of the page.

The input file is an A.B.C format where protein A and B are followed by the blast score or evalue (scaled from, say, 0-100) and another file, optionally, which is the "categories" file. This software allows you to define these "categories", see as an example. In these categories, you can split your sets into species that are very close together and species that can be called outgroups, and outgroups can also have different levels. So ingroups for close subgroups, then outgroups of different levels, will be taken into account when doing the clustering, so that you are not leaving too many outgroup proteins behind just because they are more distant in the phylogenetic tree than the ingroup species.

The hcluster_sg software was (I think still is) the software used in the EnsemblCompara GeneTrees pipeline: it scales really well and it's used for trees that encompass the whole tree of life, including eukarya, prokarya and archaea, and produces very decent protein clusters given the right categories.

ADD COMMENTlink written 6.8 years ago by 141341254653464453.5k
gravatar for qiyunzhu
6.8 years ago by
qiyunzhu420 wrote:

I have tried several of the popular programs but none works perfectly for my data. I decided to use the result of OrthoMCL because I guess it is the most popular one.

Here is an old but handy review of orthology-identification algorithms and programs. Hope you will find it useful!

Kuzniar A, van Ham RC, Pongor S, Leunissen JA. The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 2008 Nov;24(11):539-51.

ADD COMMENTlink written 6.8 years ago by qiyunzhu420
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1536 users visited in the last hour