9 months ago by
Republic of Ireland
Tag SNPs refers to a group of SNPs whose genotypes are predictive of other SNPs in their surrounding haploblocks. However, in some tagging experiments, one does not necessarily have to refer to 'haploblocks', and can instead just do a scan genome-wide for highly informative SNPs that define a particular group.
During my PhD, I developed a method for identifying haplotype tagging CNVs for the purposes of distinguishing the 4 populations from the 270 International HapMap Project, but this was before 1000 Genomes data was even released and before R packages became very popular. Whilst saying that, technically, in my tutorial here on Biostars, I am defining tag SNPS on the 1000 Genomes Phase III data, and these tag SNPs are highly informative of each respective population group: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format
In the tutorial, the tagging SNP method that I use is based on linkage disequilibrium and the calculation of the variance inflation factor (see the section entitled 'Prune variants from each chromosome'), whereby tagging SNPs are identified in SNP bins across the entire genome. In fact, you'll find that most tagging SNP methods are based on linkage disequilibrium metrics in some shape or form.
I am not aware of many implementations in R for tag SNPs. As mentioned in this previous answer, HaploView would be a good standalone choice: A: Measure Tag Snps, R Package, Tools
You could easily do both the method that I used and also export your data into HaploView for further interrogation. Hopefully you are familiar with how you can load data into these programs (be aware that plink has an export function for HaploView format).