How to convert IMPUTE2 to VCF format
2
3
Entering edit mode
8.7 years ago
lkmklsmn ▴ 970

Hi,

I have imputed genotype using IMPUTE2 (version 2.3.2). The output files look like this:

file
file_allele_probs
file_haps
file_info
file_info_by_sample
file_summary
file_warnings

Now I want to convert these files into VCF format. How can I do this?

IMPUTE2 VCF conversion • 12k views
ADD COMMENT
2
Entering edit mode

Feeling pretty peeved about this.

bcftools expects there to be a sample file with extension .samples, not .sample, which is what plink produces

It doesn't check if the file exists, doesn't produce an error message. Just crashes with a segmentation fault if the file is not there with that name. I had to run the program in gdb to find this out.

So the command is this:

bcftools convert --gensample2vcf test

And the files needed are these:

test.gen.gz test.samples
ADD REPLY
0
Entering edit mode

Thanks - the samples got me as well. But now I have Could not parse REF in "CHROM:POS_REF_ALT id: rs7899632:100000625:A:G" grr

ADD REPLY
1
Entering edit mode

You can fix the identifier using awk:

awk '{split($2,a,":"); $2="10:"a[2]"_"a[3]"_"a[4]; print}'
ADD REPLY
1
Entering edit mode

Looks like BCFtools has an option for exactly that. Check out the bcftools convert --gensample2vcf ... command here.

ADD REPLY
0
Entering edit mode

Seems like this command wants 2 files (gen-file & sample-file). However, my IMPUTE2 output (using default options) does not contain a sample-file. This has been the issue with using other conversion tools such as gtool, plink.

ADD REPLY
0
Entering edit mode

IMPUTE2 is written to be sample agnostic by default. IMPUTE2 documentation states: "Currently, the only reason to provide a sample file is if you want to exclude some individuals". You should get the samples from the input dataset you give to IMPUTE2.

ADD REPLY
2
Entering edit mode
8.7 years ago

Try QCTOOL.

ADD COMMENT
0
Entering edit mode

This way worked. It did not require sample-file. Thanks.

Important edit:

qctool does NOT retain phasing information when converting to vcf format

ADD REPLY
1
Entering edit mode
7.5 years ago
dweeks.pitt ▴ 40

While qctool will convert to VCF format without a sample file, this is only useful if you do not need to know the sample IDs, as when one does this, it generates dummy IDs in the VCF like this:

 sample_1        sample_2        sample_3        sample_4 ...

In all the work we do, we need to know which sample is which, so we have to keep track of sample IDs and can't use dummy sample IDs.

If you do have a sample file in addition to your gen file, then Mega2 can convert from IMPUTE2 format to VCF format. See the Mega2 documentation for details.

ADD COMMENT

Login before adding your answer.

Traffic: 2129 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6