impute2 output files to vcf
1
0
Entering edit mode
3.3 years ago
tarek.mohamed ▴ 350

Hi All,

I need to know how can I deal with impute2 output file, how can I convert them to vcf files or is there a way by which I get vcf files as an impute2 output

I has a vcf file with a gwas dataset for which I need to run impute2.

I converted the vcf file into plink (bed,bim,fam) fromat, then I phased these files using shapit. Shapeit returned two files (.hap and .sample).

I imputed .hap and .sample files with the reference panel files using inpute2

$impute2 -use_prephased_g -known_haps_g snps_omni_6samples.phased.refpanel.haps \ snps_omni_6samples.phased.refpanel.sample \ -h ALL.chr9.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nosing.haplotypes.gz \ -l ALL.chr9.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.nosing.legend.gz \ -m genetic_map_chr9_combined_b37.txt \ -int 86890852 86983368 \ -Ne 20000 \ -o snps.omni.6samples_imputed  Impute2 returned $ cat snps.omni.6samples_imputed | head -n5
--- rs45529242 86890852 A T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
--- rs11140489 86890931 T A 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0
--- rs182458878 86890954 C A 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
--- rs187090359 86890989 A G 0.995 0.005 0 1 0 0 0.995 0.005 0 0.995 0.005 0 0.995 0.005 0 0.995 0.005 0
--- rs139866310 86891235 A AAATT 0.021 0.971 0.008 1 0 0 0.992 0.008 0 0 0.992 0.008 0 0.992 0.008 0.992 0.008 0


and

$cat snps.omni.6samples_imputed_info | head -n5 snp_id rs_id position a0 a1 exp_freq_a1 info certainty type info_type0 concord_type0 r2_type0 --- rs45529242 86890852 A T 0.000 0.000 1.000 0 -1 -1 -1 --- rs11140489 86890931 T A 0.250 1.000 1.000 0 -1 -1 -1 --- rs182458878 86890954 C A 0.000 -0.000 1.000 0 -1 -1 -1 --- rs187090359 86890989 A G 0.002 0.003 0.995 0 -1 -1 -1  and $ cat snps.omni.6samples_imputed_info_by_sample | head -n5 concord_type0 r2_type0
1.000 1.000
0.989 0.983
0.968 0.873
1.000 1.000

vcf impute2 • 2.9k views
1
Entering edit mode

For now, the most efficient way to do this depends on what information you need in the VCF. Do you just need high-likelihood genotype calls (VCF "GT" field), or dosage values ("DS" field), or raw posterior-probability triplets ("GP" field)? And what phase information, if any, do you want to keep?

0
Entering edit mode

I was able to get the genotypes as follows, re-ran impute2 with flag -phase to generate haplotypes file. Then I used shapeit to convert get vcf files

shapeit -convert --input-haps impute2 --output-vcf impute2.vcf


How can I keep the raw posterior-probability triplets in the .gen file ("GP")??

1
Entering edit mode

Unfortunately, I'm not sure there's any preexisting program that'll integrate everything for you; there isn't even a real standard for simultaneously representing genotype-likelihood and phase information in VCF files yet. (I use the HDS field defined by Minimac4 in my own work, but that only addresses the smaller dosage + phase problem.) You might need to write something yourself. qctool2 may be worth trying, though.

1
Entering edit mode
3.3 years ago
zx8754 11k

Possible duplicate:

0
Entering edit mode

My impute2 file has a different identifier than what is required by bcftools to work (CHROM:POS_REF_ALT).

0
Entering edit mode

You can use SHAPEIT to convert IMPUTE2 GEN format to VCF.

QCTOOL also does it, but will not retain phasing information.

Also, MEGA2: A: How to convert IMPUTE2 to VCF format

0
Entering edit mode

How can I convert it using shapeit. I did not see that Gen format is an input option in shapeit convert? Thanks

0
Entering edit mode

Does this work?

shapeit -convert \
--input-gen gwas \
--output-vcf gwas.vcf

1
Entering edit mode

I re-ran impute2 with flag -phase to generate haplotypes file. Then I used shapeit to convert get vcf files

shapeit -convert --input-haps impute2 --output-vcf impute2.vcf