Hi, I would like to use 1000 genomes vcf data to select Ancestry Informative Markers (AIMs) for my population of interest. The literature shows various ways to select AIMs panel. What is the best approach to select AIMs? Thanks.
There really is no "Best way" to select for AIMs. Review of literature will indicate that every other research group comes up with a method that appeals to them in some way or makes an "adjustment" to an existing approach that either speeds up computation or gives a "more precise" estimate. Assuming that your population of interest is not African American or Hispanic, hence you want to select new AIMs. I believe that there are several good AIM panels already described in the literature for African Americans and Latinos. Before I get into the details, a word of caution. Selecting an AIM panel appears to be a relatively easy thing to do when you have ancestral allele frequencies. But be sure to validate the AIMs in some way in populations and in pedigrees. Before you select your aims, decide on what your ancestral population model will be. Essentially, how many ancestral populations do you think contributed to your study population and what tose are. Do you have a sufficient number of ancestral individuals from each group to get good marker information? If you assume that only two ancestral populations contributed to your population, then its relatively easy. You can start with delta, the allele frequency difference between populations. Classical definition of AIMs based on the work of Chakraborty and Weiss or later simply identify an AIM as a marker with a delta >=0.3. The higher the delta the better. You can also look at pair wise branch length. Fst, In (Rosenberg et al., came up with a few statistics) are all stats you can use to look at how much information regarding ancestry is captured by your markers. But at a very very basic level, the delta is a fairly good measure. Now, if you have three or more ancestral populations to choose from, then you have to be more careful. Find a way to balance marker informativeness across all ancestral groups. For example , you have European, African and Native american as ancestral groups, for the same set of AIMs you will always have more power to detect the African-non african difference. Be sure to enrich the marker panel for the European-Native American differences. Ideally, you will have the best set of markers when the three pairwise comparisons are nearly equal. You dont want to bias any one pairwise comparison against the others. I believe that some of the other measures of ancestry informativeness can look at multiple groups simultaneously for inferring ancestry informativeness. Some researchers have taken the approach of doing away with AIMs altogether and use ALL GWAS markers, but IMHO, that is a waste of the admixture approach. Hope comments help and best of luck with your search.