Question: Imputation from plink
gravatar for michael.flower.14
14 months ago by
United Kingdom
michael.flower.140 wrote:

I’ve got GWAS in plink format (bed, bim, fam). I need to impute some SNPs that weren’t directly genotyped. I’ve read that I need to phase, eg with shapeit, the impute, eg with impute2. I’m having trouble figuring out which genetic map to use for shapeit (the one on their guide doesn’t work). What would be really helpful is a step by step guide to go from plink to imputed snp, as this process seems quite painful. Here's what I've done:

I'm on mac, so had to set up an Ubuntu virtualbox to run shapeit. shapeit.v2.904.3.10.0-693.11.6.el7.x86_64, and the example data with tutorial works. I've got GWAS in plink format (TRACK-HD_v3_qc_imputed_v3.bed, TRACK-HD_v3_qc_imputed_v3.bim, TRACK-HD_v3_qc_imputed_v3.fam) Read the shapeit documentation, which says under 'Genetic map', to click this link to download the map for human populations ( My GWAS is in GRCh37, so I want to download the 'HapMap phase II b37' - however this link doesn't work ( I've been looking for an alternative genetic map. First off I went to HapMap (, but that's been retired. I went to their archive (, but it's not at all clear which file to use as the map. I also read 1KG can be used as a map, so went there, but again, not clear which file to use as a map (

I've tried using this as a genetic map - 'genetic_map_chr1_combined_b36.txt', but I get the following:

michael@michael-VirtualBox:~/bin/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/bin$ ./shapeit --input-bed TRACK-HD_v3_qc_imputed_v3.bed TRACK-HD_v3_qc_imputed_v3.bim TRACK-HD_v3_qc_imputed_v3.fam --input-map genetic_map_chr1_combined_b36.txt --output-max TRACK-HD_v3_qc_imputed_v3_phased.haps TRACK-HD_v3_qc_imputed_v3_phased.sample

Segmented HAPlotype Estimation & Imputation Tool
  * Authors : Olivier Delaneau, Jared O'Connell, Jean-François Zagury, Jonathan Marchini
  * Contact : send an email to the OXSTATGEN mail list
  * Webpage :
  * Version : v2.r904
  * Date    : 24/11/2019 14:30:04
  * LOGfile : [shapeit_24112019_14h30m04s_8d07a6e7-5f9d-45c0-8e20-706fd12a0ba6.log]

  * Autosome (chr1 ... chr22)
  * Window-based model (SHAPEIT v2)
  * MCMC iteration

Parameters :
  * Seed : 1574605804
  * Parallelisation: 1 threads
  * Ref allele is NOT aligned on the reference genome
  * MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
  * Model: 100 states per window [100 H + 0 PM + 0 R + 0 COV ] / Windows of ~2.0 Mb / Ne = 15000

Reading site list in [TRACK-HD_v3_qc_imputed_v3.bim]

ERROR: Duplicate site pos=40345847 ref=A alt=AAAC

All in all a bit fed up and going to have break from this for a while. Any help for when I come back to it later this evening would be very helpful! Thanks!

plink impute2 shapeit • 454 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by michael.flower.140

I have recently gone through this entire process (pre-phasing and then imputation) for 2 different cohorts. What do you mean that that shapeit genetic map "doesn't work"? PS - indeed, it is painful (and should not be necessary)

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe69k

Thanks, I've updated above

ADD REPLYlink written 14 months ago by michael.flower.140

Thank you. Yes, some of the links are broken, unfortunately.

I obtained my genetic maps from here Download Reference Data

I then show how SHAPEIT is essentially a 3 stage process, here: C: Strand Alignment in ShapeIt -- "Reference and Main panels are not well aligned"

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2624 users visited in the last hour