I am attempting to utilise a SNP genotyping data set to infer ancestry using ADMIXTURE, and I want to use the 1000 Genomes Project Phase 3 data as background allele frequencies for populations. However, there is a mismatch between the alleles called for the data I am using and that used by 1KG. One potential issue is that the MAP file provided in 1KG's repository use chromosome number and base position in place of ID, however I have been using these to match SNPs to my dataset.
To give an example:
The SNP annotated in 1KG's MAP file with the ID
1:100612675, chr1 pos100612675, is reported as being GG for sample HG02922 in the PED file. This SNP locus maps to RSID rs499479, which in my own dataset is called as having T and C alleles for each sample.
dbSNP reports that the two possible alleles for this SNP are C and T, implying that the SNP is correct in my current dataset, but not in 1KG, however it seems more likely that there is some other reason for this?
If anyone is able to help out, it'd be appreciated.
Thanks in advance.