Question: Rare variants imputation
gravatar for reds.nik
2.1 years ago by
reds.nik40 wrote:


I am currently working on rare variants association studies from whole genome seq data. I want to replicate my results in an array-genotyped cohort where rare variants have been previously imputed with Michigan imputation server based on Haplotype Reference Consortium panel. However, I found out that my top hits have been badly imputed and therefore I would like to re-impute rare variants based on a custom reference panel built on my own wgs data. I see that IMPUTE2 and SHAPEIT are widely suggested to this purpose. However I can't find any clear explanation about how to generate a reference panel. I have individuals vcf files and plink files and also a merged plink ped file including all the samples.

Could anybody kindly suggest me any tutorial/resource where I can learn how to do it? Is there another better strategy to impute those rare variants missed by Michigan server rather than generating my own reference panel?

Thank in advance for any help.

sequencing imputation • 796 views
ADD COMMENTlink modified 2.1 years ago by Kevin Blighe67k • written 2.1 years ago by reds.nik40
gravatar for Kevin Blighe
2.1 years ago by
Kevin Blighe67k
Republic of Ireland
Kevin Blighe67k wrote:

I am neither yet to see a good tutorial for this, and the documentation is never great for these programs.

As you already have your data in PLINK format, you can export that straight to GEN format for IMPUTE2 with the --recode oxford command-line parameter. You should, then, be able to use this straight away as a reference panel in IMPUTE2.

Edit December 11, 2019: if you are planning to do pre-phhasing into 1000 Genomes haplotypes using SHAPEIT, then SHAPEIT can read direct from PLINK format

Conversely, starting from the VCF stage, I would merge your samples into a single VCF and then use this script to convert VCF format to GEN.

I would also consider creating a merged reference panel that consists of your data plus 1000 Genomes. There have been posts on this in the past but it's not a frequent topic. Here, I am trying to link all of them:


ADD COMMENTlink modified 11 months ago • written 2.1 years ago by Kevin Blighe67k

Thanks a lot Kevin for your suggestions, I managed to create my own reference panel with --recode oxford option in PLINK. However, as you said, given my relatively small sample size, merging my reference panel with 1000 Genomes data could be a better choice.

ADD REPLYlink written 2.1 years ago by reds.nik40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1741 users visited in the last hour