Step by step instructions to phase with Eagle?
2
2
Entering edit mode
5.8 years ago
moxu ▴ 500

It’s a little bit convoluted.

I just want to impute SNPs in 23andme format like the following:

# rsid chromosome position genotype
rs3094315  1 752566 AA
rs12562034 1 768448 AA
rs3934834  1 1005806  CC
rs9442372  1 1018704  GG
rs3737728  1 1021415  GG
rs11260588 1 1021658  GG
rs6687776  1 1030565  CT


I was told Minimac3 is the best tool to impute for 1 sample at a time (I am not looking to impute multiple samples at one time, but 1 sample by 1 sample for some reason). Minimac3 is easy to use and fast, and I made it work. However, it requires phased input file, so I need to phase the file described above.

Eagle from Broad Institute was recommended to do phasing, and it seems that eagle only takes a genomic profile in .VCF format, so I converted the above file into .VCF as the following:

##fileformat=VCFv4.2
##filedate=Fri Aug 26 23:11:37 EDT 2016
##source=csv2vcf.pl
##reference=
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO  FORMAT  GENOTYPE
1 752566 rs3094315  G A . . .  GT 1/1
1 768448 rs12562034 G A . . .  GT 1/1
1 1005806  rs3934834  C T . .  .  GT 0/0
1 1018704  rs9442372  A G . .  .  GT 1/1
1 1021415  rs3737728  A G . .  .  GT 1/1
1 1021658  rs11260588 G A . .  .  GT 0/0


and named it “myprofile.vcf”. Then I ran eagle using the following:

eagle --vcf myprofile.vcf  --geneticMapFile Eagle/tables/genetic_map_hg19_withX.txt.gz --outPrefix /tmp/myprofile.beagleImputed

“Eagle/tables/genetic_map_hg19_withX.txt.gz” was provided by eagle.


It didn’t go through. The error I got was:

[W::vcf_parse] contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
ERROR: Multi-allelic site found (i.e., ALT contains multiple alleles)
Either drop or split (bcftools norm -m) multi-allelic variants


Not sure what it exactly means. “index the file with tabix”, index which file? It cannot be the “genetic_map_hg19_withX.txt.gz” file right? So I tried to “tabix myprofile.vcf”, then I got the following error:

Not a BGZF file: data/genome_3j.vcf
tbx_index_build failed: data/genome_3j.vcf


Up to this point, I think the error is reckless. Probably I’ve done something terribly wrong.

I just want to impute some SNPs in this very popular and easy format, can’t someone write a program to just take such a file as input and a couple of options to point to needed reference SNP database and/or genome sequences? Actually, someone has already done that – Michigan Imputation Server, but you need to register an account, upload your data to their server, and download results there. This is awesome and the way to go in terms of the simplicity to use, but you cannot pipeline in the server.

The “manuals” or “READMEs” or “instructions” are not good enough for me.

Thanks for any instructions.

snp genome software error • 4.8k views
0
Entering edit mode
5.8 years ago

You need to bgzip your VCF file and then index it using tabix.

bgzip myprofile.vcf
tabix myprofile.vcf.gz


##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
...
...
...


Regarding multi alleles, as said, either remove them ( using vcftools etc ) or split them using bcftools. This should be done before you index your VCF.

0
Entering edit mode

Great! After doing the bgzip on the .vcf, it went much further. I ran

eagle --vcf myprofile.vcf.gz  --geneticMapFile genetic_map_hg19_withX.txt.gz --outPrefix /tmp/t --chrom 21


and got the following error:

Phasing samples 1-1
WARNING: Sample 1 (1-indexed) has a het count of 0
ERROR: Failed to allocate 18446744073709551596 bytes


Not sure why eagle requires this much -- I guess -- memory.

It didn't complain about multi alleles this time, although I have not taken care of the multi allele problem yet because I don't know how. Could you please give me the bcftools command line to run the splitting? I don't see a split command/option or something that might be related.

Thanks much!

0
Entering edit mode

This may have been answered, but may I please check if there's a solution to the memory issue? I am having the same problem whilst trying to run Eagle2. Thanks so much.

0
Entering edit mode

Dear I am also facing the exact same error. May I know how you fixed this issue.

0
Entering edit mode

Did anyone figure out a solution to this error? ERROR: Failed to allocate 18446744073709551596 bytes

I keep getting it and I can't find an answer anywhere online.

0
Entering edit mode

I have the same problem, did anybody find a solution?

0
Entering edit mode

sorry, I found the problem!

0
Entering edit mode
3.8 years ago

Any solution/suggestion to the memory problem? Thanks.