Question

Steps for genotype imputation?

0

Entering edit mode

4.1 years ago

rk ▴ 30

Hi

Im looking to do genetic imputation for 10,000 children. Have no clue where to start I haven't done this before. Would greatly appreciate advice (e.g. how long does it take? and things to be cautious of) and appreciate any scripts laying about.

Thanks

SNP • 1.5k views

ADD COMMENT • link updated 4.1 years ago by Kevin Blighe 87k • written 4.1 years ago by rk ▴ 30

score 3 · Accepted Answer · 2020-03-13

To be honest, imputation is not an easy task if you have never done it before. I am curious to understand how / why you are getting into it? Supervisor...?

The general process proceeds in 2 main steps:

pre-phasing of genotypes into haplotypes (against a reference)
imputation of genotypes against a reference

A lot of work in this area derived from a group in Oxford, where the following programs were developed:

SHAPEIT (pre-phasing)
IMPUTE2 (imputation)

Another commonly used program is called Beagle. There is also the Michigan Imputation Server, which is probably easier if you have never done this before.

how long does it take? and things to be cautious of

...how long is a piece of string? - depends on many factors, including your compute resources, the size of the data that you have, the size of the reference panel, etc. To give you an idea, I recently imputed an Illumina GSA dataset against 1000 Genomes Phase III, and it took ~2 weeks of constant processing (32 cores; 32GB RAM), and probably 1.5 months in total when you consider everything else (script devel, dealing with errors, etc).

Another key point is that, unless you have full access to the Cray Supercomputer, you'll have to do the imputation in chunks, looped across each chromosome, like, 5 megabase chunks. The imputation programs are intelligent enough to impute a 'buffer' window outside of this to ensure a harmonious overlap between each chunk. Then, these chunks have to be pieced back together at the end.

NB - you can convert IMPUTE2's output to VCF via:

shapeit \
      -convert \
      --input-haps "${GEN}""_haps" \
      --output-vcf

Scripts? - you'll find a lot spread across the World Wde Web. For example, I have scripts for pre-phasing, here: C: Phasing with SHAPEIT

Kevin