Question: How to deal with one-to-many orthologies in PAML
0
gravatar for Solowars
23 months ago by
Solowars50
Brazil/Porto Alegre/UFRGS
Solowars50 wrote:

Hello everyone,

I want to perform a PAML analysis using codeml. For it I have an alignment containing protein-coding DNA sequences from a broad array of animals.

However, in some cases, gene orthology in my species is not 1-to-1 (i.e. Some animals have more than one ortholog, more than one sequence). The problem is that PAML accepts only one sequence per species, leaving me with the decision to choose among these multiple sequences.

I looked through PAML manual, and there is no orientation about this issue (which I believe, must be kind of common). I made some trees and distance matrices, but in some cases genes with multiple orthology are just "equally" far away from their respective orthologs.

Can you suggest any "best practice" to deal with this issue?

Thanks a lot!

paml orthology dn/ds • 600 views
ADD COMMENTlink modified 23 months ago by lieven.sterck7.2k • written 23 months ago by Solowars50
1

In general people focus on one-to-one orthologs (and skips one-to-many or many-to-many) in their analyses precisely to avoid this problem.

ADD REPLYlink written 23 months ago by Biojl1.7k

I read in several papers that they filter and keep genes only with one-to-one orthology, but I didn't think that this problem was so "unsurmountable".

ADD REPLYlink written 23 months ago by Solowars50
2
gravatar for lieven.sterck
23 months ago by
lieven.sterck7.2k
VIB, Ghent, Belgium
lieven.sterck7.2k wrote:

consider yourself still lucky; in the plant fields it's nearly all many-to-many relationships :(

From what I read you are already on the good track. What people usually do is to collect as much 'circumstantial evidence' (== the ensemble approach) as possible to support the choice for one of the orthologs. that can indeed be, phylo tree info, distance metrics, genomic location info, simple blast hits ... . In essence (and ideal case) you get enough of those to boil it down to a single gene but in reality you will often not!

here is a nice example of such an approach.

The question you asked is frequently also referred to as the "holy grail in bioinformatics" , so you likely can not expect a complete (or even any) answer.

If applicable you can of course (as suggested) only focus on the 1-to-1 orthologs to make your life easier

ADD COMMENTlink modified 23 months ago • written 23 months ago by lieven.sterck7.2k

Thank you so much for your insightful answer. I've been thinking about this issue for some time, and it's good to know that I'm by no means alone. Again, thanks!

ADD REPLYlink written 23 months ago by Solowars50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1964 users visited in the last hour