Question: Ancestry Informative Markers
Hi, How do I find the Ancestry Informative Makers in the 1000 Genomes Project data. Is there a tool to identify the AIMs in all the given populations? Any help would be appreciated. Thanks in Advance.

African American vs European Ancestry We selected a grid of 3,388 markers (distributed approximately one per megabase, across the autosomes and the X chromosome) that showed strong differentiation between African- and European-ancestry samples sequenced by the 1000 Genomes Project. Markers previously genotyped on the Illumina Omni 2.5M array were favored and markers with A/T or G/C alleles were avoided.

Native American vs European Ancestry A grid of 1,000 markers selected to be informative for Native American vs. European ancestry. These AIMs were selected to be in low linkage disequilibrium of one another (defined as R2 <= 0.1 in Native American populations, to be conservative) and widely separated (by requiring that they should be at least 250 kbases from other European vs Native American ancestry AIMs). SNPs with significant within continent heterogeneity were excluded.

These markers were previously genotyped in three samples of European ancestry (consisting of CEU and TSI samples and a population of Spaniards) and six samples of Native Americans ( Mayan, Nahuan, Zapoteca, Tepehuano, Quechuan and Aymaran).

Yes, I have built my own predictive model based on the 1000 Genomes Data - it has 99% sensitivity/specificity.

Take a look at my tutorial here: Produce PCA for 1000 Genomes Phase III in VCF format

If you get through that, it would be a great start toward building your own model.


Hi Kevin, Sorry about the delay in responding. Thanks for your reply. I understand from your link that you are using PCA to create the model. I don't exactly get the point where AIMs can be obtained from the process. It would be really helpful if you could explain it to me a little. Thanks in Advance.

The PCA bi-plot will have been based on markers that segregate the different 1000 Genome populations. So, you then take these markers and test them through regression modelling to see which ones are the best at segregating each group. At the end of the day, weather forecasting, predicting the stock markets, predicting ethnicity, et cetera are all based on modelling and then predicting.

Okay. Got it ! Thanks a lot

