Question: [orthomcl] Proteins with more than one predicted ortholog
0
gravatar for gustavoborin01
4.2 years ago by
University of Campinas, Brazil
gustavoborin0130 wrote:

Hi everyone,

I have found the predicted orthologs for two fungi through orthomcl algorithm, but when I look at the output table many of the proteins of one fungal have more than one hit and the same occurs for the other fungal. How can I say one protein has two orthologs in the other fungal, or only one? 

Besides, the table give me a "normalized score" to each pair of predicted orthologs. Does anyone know what it means? I was looking for any formula or simple explanation for it but the only thing I've found is this: "Normalize ortholog and co-ortholog pairs for any two species by averaging the e-values across them, and normalize using that average" (http://www.ncbi.nlm.nih.gov/pubmed/21901743). I know it is a normalized value related to evalue, but how? Curiously, the maximum value it is 1.576 and many of the orthologs with more than one hit in the another fungal have this score too.

 

An02g14170 e_gw1.1.1058.1 0.241
An01g08960 e_gw1.1.1090.1 1.576
An15g05520 e_gw1.1.1090.1 1.576

The parameters that I used to find the orthologs were these:

- evalueExponentCutoff = -5  (BLAST evalue < or = to 1e-5; recommended parameter);

- percentMatchCutoff = 70

- I (inflation factor) = 1.5 (recommended parameter);

 

Thank you so much for any help!

 

 

evalue orthomcl orthologs • 1.8k views
ADD COMMENTlink modified 4.2 years ago by Jean-Karim Heriche21k • written 4.2 years ago by gustavoborin0130
1
gravatar for Jean-Karim Heriche
4.2 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

The score is described in the paper describing the OrthoMCL procedure (it's referenced in the article you mention). OrthoMCL is nothing else than clustering proteins based on sequence similarity. The advantage is scalability, the disadvantage is that you can't properly infer orthology relationships, for this you need a phylogenetic tree.

ADD COMMENTlink written 4.2 years ago by Jean-Karim Heriche21k

@Jean-Karim: Thank you for your answer, but the only explanation in this paper is "a normalized similarity score" and it is recommended to see the Orthomcl Algorithm Document for the normalization function. I saw this document (https://docs.google.com/document/d/1RB-SqCjBmcpNq-YbOYdFxotHGuU7RK_wqxqDAMjyP_w/pub), but I'm not sure about what is the meaning of these score values yet. Would  it be the formula present in the topic Find potential co-ortholog pairs? "Each CO(Ax,By) is given a pair weight: O(Ax,By) = (-log10(evalue(Ax,By)) + -log10(evalue(By,Ax))) / 2"? Furthermore, do you know which parameter in blastp can I use to see only 1:1 hits? Thanks again!

ADD REPLYlink written 4.2 years ago by gustavoborin0130
2

The description of the algorithm is in ref 7 of the paper you cite: Li L, Stoeckert CJ, Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Research. 2003;13:2178–89.

In particular see fig2.

The raw score is as you describe above: the average of the -log of the e-values obtanied by blastp A vs B and B vs A. This provides a measure of similarity between any two sequences. Before applying the MCL clustering algorithm, this score is normalized by dividing by the average weight of all pairs between the two specie e.g. for two genes A and B with A from fly and B from mouse, the raw score is (-log10(evalue(A,B)) + -log10(evalue(B,A))) / 2 and the normalized score is this divided by the average of all scores between fly and mouse. You don't need/want blastp to return only one hit, you just need to take the best one for each query sequence which should always be the first in the list returned by blastp.

ADD REPLYlink written 4.2 years ago by Jean-Karim Heriche21k

Thank you so much for your help Jean-Karim. It's the first time I've read a good explanation about what is or how can I calculate the normalized score of MCL.

ADD REPLYlink written 4.2 years ago by gustavoborin0130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1928 users visited in the last hour