Question: Convert imputed genotypes from IMPUTE2 dosage .gen/.info to vcf format
3
gravatar for 1888
3.5 years ago by
188860
United States
188860 wrote:

Hi,

Is there an easy way to convert imputed genotypes from probability format like this:

--- rs190467660 10010504 A C 0.929 0.070 0 0.895 0.102 0.002 0.871 
--- rs193112405 10010957 T A 0.973 0.026 0 0.954 0.045 0
  • I think this is the output from IMPUTE2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html), with 3 values for each SNP and an info file, I do not have a sample file... The info file looks like this:

    snp_id rs_id position exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 --- rs190467660 10010504 0.070 0.092 0.872 0 -1 -1 -1 --- rs193112405 10010957 0.035 0.112 0.936 0 -1 -1 -1

to VCF format like this?

##fileformat=VCFv4.1
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT UNR1 UNR2    UNR3    UNR4
chr7    123 SNP1    A   G   100 PASS    INFO    GT:DS   0/0:0.001   0/0:0.000   0/1:0.999   1/1:1.999
chr7    456 SNP2    T   C   100 PASS    INFO    GT:DS   0/0:0.001   0/0:0.000   0/1:1.100   0/0:0.100

I am asking because I would like to apply FastQTL (http://fastqtl.sourceforge.net/), but I only have data in the mldose format.

Thank you very much for your help!

snp vcf • 3.9k views
ADD COMMENTlink modified 3.1 years ago by annonymous36830 • written 3.5 years ago by 188860

where can we get a description of this mldose format ?

ADD REPLYlink written 3.5 years ago by Pierre Lindenbaum122k

I am sorry, I think if I have the format right as IMPUTE2, then this answer (How to convert IMPUTE2 to VCF format) advises to use QCtool, command:qctool -g data.gen -og example.vcf (http://www.well.ox.ac.uk/~gav/qctool/#tutorial), which produces this output:

##fileformat=VCFv4.1
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype calls">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Genotype call probabilities">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample_1        sample_2
NA      10010504        rs190467660     A       C       .       .       .       GT:GP   0/0:0.929,0.07,0        ./.:0.895,0.102,0.002   ./.:0.871
NA      10010957        rs193112405     T       A       .       .       .       GT:GP   0/0:0.973,0.026,0       0/0:0.954,0.045,0       0/0:0.94,0.06

But this is not correct as I think it assumes genotyped values and not dosages (GT:GP while I am looking for the GT:DS and one value per sample)... Is there another tool to help convert these formats? I see there could be another option using fcgene (https://sourceforge.net/p/fcgene/wiki/Genotype%20format%20converting%20tool%20:%20FCgene/), maybe converting from IMPUTE2 -> Plink -> VCF. Is there a more direct way?

Thank you

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by 188860

Were you able to resolve this problem? I also need to convert IMPUTE2 .gprobs files to .vcf format and qctools is only able to process a few lines of data with unclear exit.

ADD REPLYlink written 3.1 years ago by annonymous36830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1126 users visited in the last hour