PAML sample size question and variation in results
2.1 years ago
DNAngel ▴ 240

Hi all,

I posted my question in the PAML discussion group twice and haven't heard anything for months so hoping someone here can explain this to me:

I have a gene dataset with two partitions A and B, I separate the dataset into just A and just B to use for random-sites models to see if there's any pos selection happening in either dataset. Then I want to also see if there's divergent selection happening so I use CmC. What I notice is that with random-sites for like 40 genes I'm analyzing, there's more support for positive selection (i..e M2a vs. M1a is significant) for like 30 genes for partition A and only 15 for B. CmC analysis, all the genes are just showing purifying but divergent selection (even when significant, both partitions have dN/dS < 1). I am wondering if the random-sites models is affected more by sample size. Partition A had 40 species but partition B I only had data for 15 species. If this is indeed a discrepancy where more species = more variation detected in random-sites models, why would they appear so similar then in CmC when I use the combine dataset and label my partitions, with partition B set as the foreground.

Hope this made sense! Thank you!

