I have a question regarding the use of CODEML from the PAML package. I have sequences for 14 strains and 4 of these strains cause a specific disease. I’ve analyzed the 4 and 10 strains (and detected positive selection) separately (so for each group) using model=”0” NSsites=”7 8”, based on a tree that was created from the MSA CDS alignment.

However, it should also be possible to let CODEML find specific sites that are under selection for the two groups combined.  This would require a predefined tree that has the branch indicated that separates the two groups. CODEML should then be able to find sites that are specific for one group.

If I recall it correctly these settings should be used: model=”2” NSsites=”2”, since this compares between 2 branches. I provided codeml with a custom tree with the “#1” at the concerned branch.

My problem is that the first described method (NSsites= 7 8) gives a high number of genes with positive selection for the two groups (10 and 4 strains) while the second method (model 2 NSsites 2) doesn’t give any results at all (using the combined groups and an artificial tree).

I’ve compared the results from both groups and although there are some overlapping genes/sites (so positive selection in both groups) there are many occasions in which there is only positive selection in one group.

I expected these genes/sites also to be found using the model 2 and NSsites 2 setting.

Does anyone have a clue what I might be missing?


Thanks in advance,

Regards Rico

