I am new to using PAML and want to use it for analysis of large number of genes found in a group of bacterial strains. As first step I have wanted to obtain just one value per gene per tree (model = 0; NSsites = 0) for analysis. My problem is that my tree includes all investigated species, while some genes are found only in a subset of strains, as a result Codeml can't calculate tree-wide omega since it doesn't have information for the whole tree. What can be the best approach to circumvent such problem:
1. Create pairwise dN/dS values for my alignment using yn00 and just find their average instead of finding tree-wide dN/dS
2. Allow CodeML to create it own tree for each gene and then calculate omega. In this case - how much bias it can introduce when comparing between genes, since sequence alignment for each gene most probably would generate different tree.
In addition, I have noticed that if I am taking a file with aligned sequences and just change their order (w/o re-aligning or something) - I get different branch specific omegas when running it through Codeml with the same initial tree. Since Codeml seems to use a tree for guiding itself and it didn't change, I don't understand why the order of the sequences in the alignment file matters. And if it does - what is the preferable order of the sequences?
Thank you all for your help,