Question: Inference orthology relationship
gravatar for Ömer An
5.9 years ago by
Ömer An230
Ömer An230 wrote:

Extracting orthology relationships follows the main steps of clustering, multiple alignment, and tree generation. Broadly, protein sequences are initially clustered using all-against-all BlastP, then the sequences within each cluster undergo multiple alignment to give a phylogenetic tree. Then orthology relationships are inferred from this tree. This is more or less the pipeline used in eggNOG, TreeFam and EnsemblCompara.

My question is what is the purpose of building phylogenetic tree? If I want only the orthologs of a gene, clusters should be enough, since I already get a group of genes based on sequence similarity. What does phylogenetic tree add more and is it required to get the orthologs of a gene?

orthology phylogenetic tree • 2.3k views
ADD COMMENTlink modified 5.9 years ago by a.zielezinski9.6k • written 5.9 years ago by Ömer An230

Maybe I am wrong bu I believe  markers go through this procedure in order to get relationships . Once you get those you can use it as an infrastructure (i hope this is the right word) for inferring orthology relationships for other genes (protreins) . Please someone correct me if I'm wrong

ADD REPLYlink written 5.9 years ago by mxs530
gravatar for a.zielezinski
5.9 years ago by
a.zielezinski9.6k wrote:

The clustering approach does not differentiate between in- and out-paralogs. Your clusters may include many-to-many orthologous relationships between two proteomes. This is the case of large multigene families that share a common domain architecture and display high sequence identity among members (for example: kinase proteins). Therefore, you need to investigate the phylogenetic trees for duplication and speciation events.

However, the graph-based methods are more applicable on a global scale and rely on pairwise comparisons producing the highest number of orthologs with the minimal error rates. In practice, graph-based methods use a sequence-similarity search algorithm (e.g. BLAST, Smith-Waterman etc.) and a scoring scheme to calculate sequence similarities between all sequences in the genomes being compared. Pairs of genes, one from either species, that are each other’s highest scoring match are considered orthologous. In this way, orthologous genes are thought to be more similar to each other than they are to any other genes from the compared organisms. Under this assumption, graph-based orthology inferences can only be applicable for complete genomes (proteomes).

I recommend this review about computational methods used for orthology inference.

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by a.zielezinski9.6k
gravatar for moranr
5.9 years ago by
moranr270 wrote:

The tree becomes very useful to infer whether a homolog is a true ortholog, rather than being a product of an ancient paralog, amongst many other things.  Maybe have a read of 

ADD COMMENTlink written 5.9 years ago by moranr270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1454 users visited in the last hour