It’s a little bit convoluted.
I just want to impute SNPs in 23andme format like the following:
# rsid chromosome position genotype rs3094315 1 752566 AA rs12562034 1 768448 AA rs3934834 1 1005806 CC rs9442372 1 1018704 GG rs3737728 1 1021415 GG rs11260588 1 1021658 GG rs6687776 1 1030565 CT
I was told Minimac3 is the best tool to impute for 1 sample at a time (I am not looking to impute multiple samples at one time, but 1 sample by 1 sample for some reason). Minimac3 is easy to use and fast, and I made it work. However, it requires phased input file, so I need to phase the file described above.
Eagle from Broad Institute was recommended to do phasing, and it seems that eagle only takes a genomic profile in .VCF format, so I converted the above file into .VCF as the following:
##fileformat=VCFv4.2 ##filedate=Fri Aug 26 23:11:37 EDT 2016 ##source=csv2vcf.pl ##reference= ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE 1 752566 rs3094315 G A . . . GT 1/1 1 768448 rs12562034 G A . . . GT 1/1 1 1005806 rs3934834 C T . . . GT 0/0 1 1018704 rs9442372 A G . . . GT 1/1 1 1021415 rs3737728 A G . . . GT 1/1 1 1021658 rs11260588 G A . . . GT 0/0
and named it “myprofile.vcf”. Then I ran eagle using the following:
eagle --vcf myprofile.vcf --geneticMapFile Eagle/tables/genetic_map_hg19_withX.txt.gz --outPrefix /tmp/myprofile.beagleImputed “Eagle/tables/genetic_map_hg19_withX.txt.gz” was provided by eagle.
It didn’t go through. The error I got was:
[W::vcf_parse] contig '1' is not defined in the header. (Quick workaround: index the file with tabix.) ERROR: Multi-allelic site found (i.e., ALT contains multiple alleles) Either drop or split (bcftools norm -m) multi-allelic variants
Not sure what it exactly means. “index the file with tabix”, index which file? It cannot be the “genetic_map_hg19_withX.txt.gz” file right? So I tried to “tabix myprofile.vcf”, then I got the following error:
Not a BGZF file: data/genome_3j.vcf tbx_index_build failed: data/genome_3j.vcf
Up to this point, I think the error is reckless. Probably I’ve done something terribly wrong.
Can someone please help? Either with eagle/fabix or someother workaround.
I just want to impute some SNPs in this very popular and easy format, can’t someone write a program to just take such a file as input and a couple of options to point to needed reference SNP database and/or genome sequences? Actually, someone has already done that – Michigan Imputation Server, but you need to register an account, upload your data to their server, and download results there. This is awesome and the way to go in terms of the simplicity to use, but you cannot pipeline in the server.
The “manuals” or “READMEs” or “instructions” are not good enough for me.
Thanks for any instructions.