Question: Genotype Imputation And Population Substructure
6
gravatar for Darren J. Fitzpatrick
7.7 years ago by
Ireland/ United Kingdom
Darren J. Fitzpatrick1.1k wrote:

Hi

I have a set of SNPs (~500,000) genotyped from 1000 individuals but the ethnicity of the individuals is unknown for a large portion of the individuals.

I wish to impute ungenotyped SNPs from HapMap data. Given that I don't know the ethnicity, I am unsure which HapMap population to use - in fact, I am really unsure how to proceed. Currently, my thinking is as follows:

  1. Infer ethnicity of individuals, perhaps using STRUCTURE

  2. Divide SNP data based on ethnicity

  3. Use the different ethinicty based subsets to impute unknown genotypes using the relevant HapMap population, e.g., the CEU population for those indiviuals who are Caucasian

Does this seem reasonable? Would you have any suggestions on how to tackle this problem?

population imputation • 3.0k views
ADD COMMENTlink written 7.7 years ago by Darren J. Fitzpatrick1.1k
9
gravatar for Genotepes
7.7 years ago by
Genotepes950
Nantes (France)
Genotepes950 wrote:

Hi,

actually, one of the approaches would be to identify substructures of populations (using PCA rather than STRUCTURE, I guess).

If clear patterns emerge, you can divide your population into more homogenous subsets - .

As for the imputation, there are several "schools". A very "orthodox" apporach would be to put HapMap3 data in your PCA (or 1000 G data) for common SNPs in order to find, for each of your subset populations the closest (ethnically) panel. As you are suggesting. And then you would impute genotypes in each of your sub-population with the closest panel.

Nevertheless, a more flexible approach was developepd recently by Howie and Marchini.

In this approach, the program (IMPUTE) is searching, for each small chromosomal region that you want to impute, in a large ethnically mixed panel, the chromosome chunks that are close to the chromosome to be imputed.

If your data shows clear ethnical separation - your individuals are 100% Europeans and very divergent from any other panel population - then you will be automatically back to your imputation using a 100% European panel. However, if some regions show less divergence between populations, then, for these regions, the imputation will use a larger panel. For me, this approach is theoretically appealing because this is a kind of generalisation of the basic populaiton-specific apporach where you have to impose a threshold. It seems that in practice it also works quite well - but it is very new and therefore cannot guarantee 100%

Beware that now, IMPUTE strongly advise pre-pahsing before running the imputation. For this prepashing, it can be interesting to have your own data divided into homogenous populations. But I wouldn't advise populations < 200 individuals because you need enough individuals for phasing.

Best

Christian

Check this reference for more (and clearer) information,

http://www.g3journal.org/content/1/6/457.full

ADD COMMENTlink modified 7.7 years ago • written 7.7 years ago by Genotepes950
1

Nice answer. Would PCA actually separate out the populations nicely?

ADD REPLYlink written 7.7 years ago by brentp23k
1

Hi. This would separate them quiet nicely indeed. There is a minor problem about the number of dimensions but for intercontinental differences, the first two axes will make do ..

ADD REPLYlink written 7.7 years ago by Genotepes950

@genotepes: Thanks for that!

ADD REPLYlink written 7.7 years ago by Darren J. Fitzpatrick1.1k

I agree with Genotepes.

You can do imputation with IMPUTE2, then use the complete HapMap reference panel and don't care about the population structure at all! 

IMPUTE2 includes algorithms to choose the optimum subset of the reference panel for you. This is better than making a subset of the population by yourself since it is very ieasy to import some bias. 

Check this: http://mathgen.stats.ox.ac.uk/impute/using_multi_population_reference_panels.html#how_does_it_work

ADD REPLYlink written 5.5 years ago by Kantale120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour