7.7 years ago by
actually, one of the approaches would be to identify substructures of populations (using PCA rather than STRUCTURE, I guess).
If clear patterns emerge, you can divide your population into more homogenous subsets - .
As for the imputation, there are several "schools". A very "orthodox" apporach would be to put HapMap3 data in your PCA (or 1000 G data) for common SNPs in order to find, for each of your subset populations the closest (ethnically) panel. As you are suggesting.
And then you would impute genotypes in each of your sub-population with the closest panel.
Nevertheless, a more flexible approach was developepd recently by Howie and Marchini.
In this approach, the program (IMPUTE) is searching, for each small chromosomal region that you want to impute, in a large ethnically mixed panel, the chromosome chunks that are close to the chromosome to be imputed.
If your data shows clear ethnical separation - your individuals are 100% Europeans and very divergent from any other panel population - then you will be automatically back to your imputation using a 100% European panel.
However, if some regions show less divergence between populations, then, for these regions, the imputation will use a larger panel.
For me, this approach is theoretically appealing because this is a kind of generalisation of the basic populaiton-specific apporach where you have to impose a threshold. It seems that in practice it also works quite well - but it is very new and therefore cannot guarantee 100%
Beware that now, IMPUTE strongly advise pre-pahsing before running the imputation. For this prepashing, it can be interesting to have your own data divided into homogenous populations. But I wouldn't advise populations < 200 individuals because you need enough individuals for phasing.
Check this reference for more (and clearer) information,
modified 7.7 years ago
7.7 years ago by
Genotepes • 950