Dear all, as part of a study we have done full genome sequencing of 74 Toxoplasma gondii isolates (haploid, protozoan zoonotic parasite, genome size about 70 mb) using Illumina technology. The isolates/strains are very closely related to each other (clonal population). In this study, we would like to analyze microdiversity within this clonal population in more detail. For this we used ADMIXTURE and PCA. For these analyses we have performed LD pruning of the SNPs applying PLINK1.9 to prepare the data and have found that we loose very large number of SNPs.
My question: is it absolutely necessary to use LD pruning for the SNP datasets (VCFs) of a clonal population before doing ADMIXTURE and/or PCA?
Thanks for your help
Pavlo
Dear 4galaxy77,
thank you for your very fast help. If I apply the filter for LD pruning 50 5 0.2 I still have 23373 SNPs of 85076. So I try to test different cutoffs as you rocomended.
is there, in general, a rule of thumb as to which cutoffs can be "legally" used/tested? I have tested following combinations
The next question which arises, when I tested different Cutoffs. Is there any way, or any parameter, which can be used to make a decision which cutoff is stringent enough?
At the moment I try to test all those cutoffs in ADMIXTURE analysis as follows:
...and extract the CVE values to check which "K" in the respective Cutoff scenario have the lowest CVE value in overall.
Finally I get such results:
What do you think is it the right way to decide which cutoff should be used for LD pruning?
Thank you for your help Kind regards Pavlo