Question

Formulating the best imputation strategy for multiple GWAS chips in an admixed population

4

Entering edit mode

8.5 years ago

LauferVA 4.2k

I want to find the best way to impute 5 different chips (Omni 1M, Omni 1S, MEGA, Omni 5M, Immunochip) for a GWAS study in African Americans.

There is overlap between the people on each array; meaning some people genotyped on MEGA, for instance, are also genotyped on Omni 1M.

The relative numbers of people are:

Omni 1M - 1200

Omni 1S - 1200

MEGA - 485

5M - 985

Ichip - 1400

I also have CGI whole genome sequencing (38x) on 62 people. These people have genotyping data on one or more platforms, and the sequencing data is high quality, especially for common variants.

The MEGA array in particular is supposed to contain lots of variants found in persons of African Ancestry, and the 5M has about 5M SNPs on it, so fairly good density.

Now with all that as background, my question is: what is the most sensible imputation strategy?

Should I lump everything and impute on the lumped data?

Or should I try a more complex approach, and perhaps try to judge imputation accuracy by comparing imputation estimates to markers genotyped on the same sample on a different chip? Is that likely to matter to the final analysis, or is it probably academic? And finally, is there a clearly best program in this day and age? Should I use SHAPEIT2? Genotype Harmonizer seems attractive because it does phasing and strand flipping across chips as well, but is it as good?

With such questions in mind, I would appreciate any advice on a practical imputation strategy for such data.

GWAS imputation admixture meta analysis • 2.2k views

ADD COMMENT • link 8.5 years ago by LauferVA 4.2k

0

Entering edit mode

I am looking at a similar scenario, what course did you find worked best?

ADD REPLY • link 7.2 years ago by JustGettinStarted • 0