Question

Inference orthology relationship

2

Entering edit mode

10.3 years ago

bounlu ▴ 270

Extracting orthology relationships follows the main steps of clustering, multiple alignment, and tree generation. Broadly, protein sequences are initially clustered using all-against-all BlastP, then the sequences within each cluster undergo multiple alignment to give a phylogenetic tree. Then orthology relationships are inferred from this tree. This is more or less the pipeline used in eggNOG, TreeFam and EnsemblCompara.

My question is what is the purpose of building phylogenetic tree? If I want only the orthologs of a gene, clusters should be enough, since I already get a group of genes based on sequence similarity. What does phylogenetic tree add more and is it required to get the orthologs of a gene?

orthology phylogenetic tree • 3.5k views

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by bounlu ▴ 270

0

Entering edit mode

Maybe I am wrong but I believe markers go through this procedure in order to get relationships. Once you get those you can use it as an infrastructure (i hope this is the right word) for inferring orthology relationships for other genes (proteins). Please someone correct me if I'm wrong

ADD REPLY • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by mxs ▴ 530

Ram · Answer 1 · 2015-04-01

The clustering approach does not differentiate between in- and out-paralogs. Your clusters may include many-to-many orthologous relationships between two proteomes. This is the case of large multigene families that share a common domain architecture and display high sequence identity among members (for example: kinase proteins). Therefore, you need to investigate the phylogenetic trees for duplication and speciation events.

However, the graph-based methods are more applicable on a global scale and rely on pairwise comparisons producing the highest number of orthologs with the minimal error rates. In practice, graph-based methods use a sequence-similarity search algorithm (e.g. BLAST, Smith-Waterman etc.) and a scoring scheme to calculate sequence similarities between all sequences in the genomes being compared. Pairs of genes, one from either species, that are each other's highest scoring match are considered orthologous. In this way, orthologous genes are thought to be more similar to each other than they are to any other genes from the compared organisms. Under this assumption, graph-based orthology inferences can only be applicable for complete genomes (proteomes).

I recommend this review about computational methods used for orthology inference.

Ram · Answer 2 · 2015-04-01

2

Entering edit mode

10.3 years ago

moranr ▴ 290

The tree becomes very useful to infer whether a homolog is a true ortholog, rather than being a product of an ancient paralog, amongst many other things. Maybe have a read of https://classes.soe.ucsc.edu/bme225/Fall07/lecture8_1.orthology.pdf

ADD COMMENT • link updated 3.1 years ago by Ram 45k • written 10.3 years ago by moranr ▴ 290