Question: Population Genetics from SNP data- no VCF
gravatar for vidyavuru
16 months ago by
vidyavuru0 wrote:

Hi, this is going to be a long question:

My dataset is basically flowers that are tetraploid in nature. These flowers can be divided into wild and garden flowers.

1) Of what I've read, Tajima's D and Pi are population specific. That is, it mostly explains population structure and should be mostly used within a set population.

Based on this, my idea was to align sequences (based on type (wild /garden)) and measure Tajima's D separately for each type, and based on this alignment,

I measure Tajima's D. My problem here is that the R package that does estimate Tajima's D (PopGenome) does not consider heterozygous SNPs, and thus, would not be accurate for tetraploid SNPs in these sequences. I then found this package, 'snpR' (Hemstrow) that does calculate Tajima's D and pi over a sliding window, but I'm unable to specify populations in that function. I ended up thinking that I could just divide the data based on the types of flowers, but I wouldn't be filtering out monomorphic SNPs. But, I also think that it shouldn't matter, because Tajima's D only measures within a given population.

Could you give any suggestion or idea with regards to how I could measure Tajima's D and pi?

2) For Ka/Ks, I have come across numerous papers that use either DNAsp or KaKs calculator or seqinr kaks(), but I don't think the ploidy is really taken into consideration.

i) Of what I understood, ks/ks is done between populations, to find out the rate of mutation with respect to a certain reference sequence. In that case, if, for example, I have 10 cut flowers and 10 other types of flowers, how do I compare the ka/ks?

    From what I read, I would take the CDS sequences, align them and then measure ka/ks (using, for example, seqinr::kaks()), but then I might be losing information on some of the SNPs by blindly aligning all the same type of flowers together. Is there a better procedure to handle this?

ii) I also found some analyses that assesses ka/ks per position. My question here is then, how are you comparing between populations?

Is there a better way to assess ka/ks in r?

population genetics snp R • 487 views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 16 months ago by vidyavuru0

Are you wedded to estimating summary statistics in R? I'm a big fan of using R whenever I can, but I find the packages lacking for this level of non-model population genetics.

ADD REPLYlink written 16 months ago by Brice Sarver3.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1080 users visited in the last hour