Question

The best way to use PAML to analyze genes found only in some species on my tree

0

Entering edit mode

9.0 years ago

seb85il • 0

Hi all,

I am new to using PAML and want to use it for analysis of large number of genes found in a group of bacterial strains. As first step I have wanted to obtain just one value per gene per tree (model = 0; NSsites = 0) for analysis. My problem is that my tree includes all investigated species, while some genes are found only in a subset of strains, as a result Codeml can't calculate tree-wide omega since it doesn't have information for the whole tree. What can be the best approach to circumvent such problem:

Create pairwise dN/dS values for my alignment using yn00 and just find their average instead of finding tree-wide dN/dS
Allow CodeML to create it own tree for each gene and then calculate omega. In this case - how much bias it can introduce when comparing between genes, since sequence alignment for each gene most probably would generate different tree.

In addition, I have noticed that if I am taking a file with aligned sequences and just change their order (w/o re-aligning or something) - I get different branch specific omegas when running it through Codeml with the same initial tree. Since Codeml seems to use a tree for guiding itself and it didn't change, I don't understand why the order of the sequences in the alignment file matters. And if it does - what is the preferable order of the sequences?

Thank you all for your help,

Evgeni

dN-dS CodeML PAML alignment • 2.8k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 9.0 years ago by seb85il • 0