Question: Ways to convert .impute2 to .mldose
3
4.0 years ago by
European Union
coleman_jonathan370 wrote:

Does anyone know of any software package that can convert .impute2 data to .mldose (i.e. imputed data from IMPUTE2 to imputed data from MACH)? I have tried impute2mach in GenABEL, but it has repeatedly failed with a known error...

Direct conversion preferred, but I'm happy with indirect as long as it works!

snp imputation • 5.6k views
modified 3.9 years ago • written 4.0 years ago by coleman_jonathan370

I have the same probleme, could you please tell me how to write the R script to made the dosage file ?

5
4.0 years ago by
zx87544.3k
London
zx87544.3k wrote:

IMPUTE output is posterior probabilities, 3 columns for 1 SNP, e.g.: 0.1 0.1 0.8 corresponding to AA AB BB. Meaning it is more likely to be BB. MACH outputs dosage, 1 column for 1 SNP, e.g.: 1.7, corresponding 0=AA, 1=AB, 2=BB, so 1.7 is more likely to be BB.

Depending on the size of the data, you can write a quick script in R that would convert posterior probabilities to dosage:

0.1*0+0.1*1+0.8*2=1.7

1

Thanks - that's useful, I'll give it a try!

4
3.9 years ago by
European Union
coleman_jonathan370 wrote:

I have written a brief cookbook to perform this conversion in UNIX:

http://openwetware.org/wiki/User:Jonathan_R._I._Coleman/Notebook/Notes_and_Protocols/2014/06/27

1

The awk code for this has now been cleaned to make it more efficient (credit: Tommy Carstensen).

Alternatively, the Uni of Washington has an R package for post-Impute2 conversions: http://www.bioconductor.org/packages/release/bioc/manuals/GWASTools/man/GWASTools.pdf

1

How does your code deal with "0 0 0", i.e.: no call? Does it convert to Ref as 0*0+0*1+0*2=0 or NA?

My code at the time you have accessed it would convert 0 0 0 to 0 (which is wrong! I think I assumed a non-call would default to 0.33 0.33 0.33, but it doesn't). This is easily fixable (get it to set 0 0 0 to NA) - I will implement a patch. Thanks for pointing this out!

Thanks for the script for mldose format. Can you please tell me how to get mlinfo file i.e. how to convert impute2_info into mlinfo format? Thanks! Best Wishes, Meraj

1

This is less straightforward - the information metrics produced by these programs are not totally equivalent, although highly correlated (see Marchini and Howie, 2010 www.nature.com/nrg/journal/v11/n7/extref/nrg2796-s3.pdf). The files themselves are also structured differently:

mlinfo:

``````SNP     Al1     Al2     Freq1   MAF     Quality Rsq
rs11089130      C       G       0.3362  0.3362  0.4776  0.0160
``````

.impute2_info

``````snp_id rs_id position exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0
--- rs9628072 50000058 0.033 0.626 0.969 0 -1 -1 -1
``````

It would be possible to convert these with a bit of additional information (the alleles of each variant, which should be able to be obtained from the reference panel used for imputation). However, I'm not sure whether the conversion is necessary - you could simply filter on the impute2 info metric to obtain a list of SNPs to retain for analysis?

2
4.0 years ago by
Joey410
Seattle
Joey410 wrote:

You can try a program called fcgene (http://sourceforge.net/projects/fcgene/). Look into Chapter 7 of the manual which accompanies the tool.

Thanks,

-Joey