Question: How to convert IMPUTE2 to VCF format
gravatar for lkmklsmn
5.2 years ago by
United States
lkmklsmn930 wrote:


I have imputed genotype using IMPUTE2 (version 2.3.2). The output files look like this:  








Now I want to convert these files into VCF format. How can I do this? 


conversion impute2 vcf • 8.3k views
ADD COMMENTlink modified 3.9 years ago by dweeks.pitt40 • written 5.2 years ago by lkmklsmn930

Feeling pretty peeved about this.

bcftools expects there to be a sample file with extension .samples not .sample, which is what plink produces

It doesn't check if the file exists, doesn't produce an error message. Just crashes with a segmentation fault if the file is not there with that name. I had to run the program in gdb to find this out.

So the command is this: bcftools convert --gensample2vcf test

And the files needed are these: test.gen.gz test.samples

ADD REPLYlink written 4.6 years ago by davenomiddlenamecurtis20

Thanks - the samples got me as well. But now I have Could not parse REF in "CHROM:POS_REF_ALT id: rs7899632:100000625:A:G" grr

ADD REPLYlink written 4.4 years ago by wuttke0

You can fix the identifier using awk:

awk '{split($2,a,":"); $2="10:"a[2]"_"a[3]"_"a[4]; print}'

ADD REPLYlink written 4.4 years ago by plott0

Looks like BCFtools has an option for exactly that. Check out the bcftools convert --gensample2vcf ... command here.

ADD REPLYlink written 5.2 years ago by Sean210

Seems like this command wants 2 files (gen-file & sample-file). However, my IMPUTE2 output (using default options) does not contain a sample-file. This has been the issue with using other conversion tools such as gtool, plink. 

ADD REPLYlink written 5.2 years ago by lkmklsmn930

IMPUTE2 is written to be sample agnostic by default. IMPUTE2 documentation states: "Currently, the only reason to provide a sample file is if you want to exclude some individuals". You should get the samples from the input dataset you give to IMPUTE2.

ADD REPLYlink written 4.4 years ago by plott0
gravatar for Alexander Skates
5.2 years ago by
United Kingdom
Alexander Skates360 wrote:


ADD COMMENTlink written 5.2 years ago by Alexander Skates360

This way worked. It did not require sample-file. Thanks.

Important edit:  

qctool does NOT retain phasing information when converting to vcf format

ADD REPLYlink modified 4.9 years ago • written 5.2 years ago by lkmklsmn930
gravatar for dweeks.pitt
3.9 years ago by
United States
dweeks.pitt40 wrote:

While qctool will convert to VCF format without a sample file, this is only useful if you do not need to know the sample IDs, as when one does this, it generates dummy IDs in the VCF like this:

 sample_1        sample_2        sample_3        sample_4 ...

In all the work we do, we need to know which sample is which, so we have to keep track of sample IDs and can't use dummy sample IDs.

If you do have a sample file in addition to your gen file, then Mega2 can convert from IMPUTE2 format to VCF format. See the Mega2 documentation for details.

ADD COMMENTlink written 3.9 years ago by dweeks.pitt40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1364 users visited in the last hour