Our lab sequenced 7 genomes from the same organism, the divergence time between these genomes is less than 0.5 Mya.
~ 42000 ortholog gene families were constructed from these genomes,
to find positive selected genes,
I use codeml M2a vs M1a to test (df=2) for positive selection,
using a FDR cutoff of 0.01, I got ~16000 (38%) genes that are under positive selction,
does codeml suitable for our dataset (genes come from different strains of the same species)?
the control file i specified is:
seqfile = input.seq treefile = input.tree outfile = mlc noisy = 3 verbose = 1 runmode = 0 seqtype = 1 CodonFreq = 2 clock = 0 model = 1 2 NSsites = 2 icode = 0 fix_omega = 0 omega = .9 fix_kappa = 0 kappa = .3 cleandata = 1``