Hi, this is going to be a long question:
My dataset is basically flowers that are tetraploid in nature. These flowers can be divided into wild and garden flowers.
1) Of what I've read, Tajima's D and Pi are population specific. That is, it mostly explains population structure and should be mostly used within a set population.
Based on this, my idea was to align sequences (based on type (wild /garden)) and measure Tajima's D separately for each type, and based on this alignment,
I measure Tajima's D. My problem here is that the R package that does estimate Tajima's D (PopGenome) does not consider heterozygous SNPs, and thus, would not be accurate for tetraploid SNPs in these sequences. I then found this package, 'snpR' (Hemstrow) that does calculate Tajima's D and pi over a sliding window, but I'm unable to specify populations in that function. I ended up thinking that I could just divide the data based on the types of flowers, but I wouldn't be filtering out monomorphic SNPs. But, I also think that it shouldn't matter, because Tajima's D only measures within a given population.
Could you give any suggestion or idea with regards to how I could measure Tajima's D and pi?
2) For Ka/Ks, I have come across numerous papers that use either DNAsp or KaKs calculator or seqinr kaks(), but I don't think the ploidy is really taken into consideration.
i) Of what I understood, ks/ks is done between populations, to find out the rate of mutation with respect to a certain reference sequence. In that case, if, for example, I have 10 cut flowers and 10 other types of flowers, how do I compare the ka/ks?
From what I read, I would take the CDS sequences, align them and then measure ka/ks (using, for example, seqinr::kaks()), but then I might be losing information on some of the SNPs by blindly aligning all the same type of flowers together. Is there a better procedure to handle this?
ii) I also found some analyses that assesses ka/ks per position. My question here is then, how are you comparing between populations?
Is there a better way to assess ka/ks in r?
Are you wedded to estimating summary statistics in R? I'm a big fan of using R whenever I can, but I find the packages lacking for this level of non-model population genetics.