Question: Population Genetics from SNP data- no VCF
0
16 months ago by
vidyavuru0 wrote:

Hi, this is going to be a long question:

My dataset is basically flowers that are tetraploid in nature. These flowers can be divided into wild and garden flowers.

1) Of what I've read, Tajima's D and Pi are population specific. That is, it mostly explains population structure and should be mostly used within a set population.

``````Based on this, my idea was to align sequences (based on type (wild /garden)) and measure Tajima's D separately for each type, and based on this alignment,

I measure Tajima's D. My problem here is that the R package that does estimate Tajima's D (PopGenome) does not consider heterozygous SNPs, and thus, would not be accurate for tetraploid SNPs in these sequences. I then found this package, 'snpR' (Hemstrow) that does calculate Tajima's D and pi over a sliding window, but I'm unable to specify populations in that function. I ended up thinking that I could just divide the data based on the types of flowers, but I wouldn't be filtering out monomorphic SNPs. But, I also think that it shouldn't matter, because Tajima's D only measures within a given population.
``````

Could you give any suggestion or idea with regards to how I could measure Tajima's D and pi?

2) For Ka/Ks, I have come across numerous papers that use either DNAsp or KaKs calculator or seqinr kaks(), but I don't think the ploidy is really taken into consideration.

``````i) Of what I understood, ks/ks is done between populations, to find out the rate of mutation with respect to a certain reference sequence. In that case, if, for example, I have 10 cut flowers and 10 other types of flowers, how do I compare the ka/ks?

From what I read, I would take the CDS sequences, align them and then measure ka/ks (using, for example, seqinr::kaks()), but then I might be losing information on some of the SNPs by blindly aligning all the same type of flowers together. Is there a better procedure to handle this?

ii) I also found some analyses that assesses ka/ks per position. My question here is then, how are you comparing between populations?
``````

Is there a better way to assess ka/ks in r?

population genetics snp R • 487 views
modified 16 months ago by Biostar ♦♦ 20 • written 16 months ago by vidyavuru0

Are you wedded to estimating summary statistics in R? I'm a big fan of using R whenever I can, but I find the packages lacking for this level of non-model population genetics.