Question: Step by step instructions to phase with Eagle?
2
gravatar for moxu
2.8 years ago by
moxu440
moxu440 wrote:

It’s a little bit convoluted.

I just want to impute SNPs in 23andme format like the following:

# rsid chromosome position genotype
rs3094315  1 752566 AA
rs12562034 1 768448 AA
rs3934834  1 1005806  CC
rs9442372  1 1018704  GG
rs3737728  1 1021415  GG
rs11260588 1 1021658  GG
rs6687776  1 1030565  CT

I was told Minimac3 is the best tool to impute for 1 sample at a time (I am not looking to impute multiple samples at one time, but 1 sample by 1 sample for some reason). Minimac3 is easy to use and fast, and I made it work. However, it requires phased input file, so I need to phase the file described above.

Eagle from Broad Institute was recommended to do phasing, and it seems that eagle only takes a genomic profile in .VCF format, so I converted the above file into .VCF as the following:

##fileformat=VCFv4.2
##filedate=Fri Aug 26 23:11:37 EDT 2016
##source=csv2vcf.pl
##reference=
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO  FORMAT  GENOTYPE
1 752566 rs3094315  G A . . .  GT 1/1
1 768448 rs12562034 G A . . .  GT 1/1
1 1005806  rs3934834  C T . .  .  GT 0/0
1 1018704  rs9442372  A G . .  .  GT 1/1
1 1021415  rs3737728  A G . .  .  GT 1/1
1 1021658  rs11260588 G A . .  .  GT 0/0

and named it “myprofile.vcf”. Then I ran eagle using the following:

eagle --vcf myprofile.vcf  --geneticMapFile Eagle/tables/genetic_map_hg19_withX.txt.gz --outPrefix /tmp/myprofile.beagleImputed


“Eagle/tables/genetic_map_hg19_withX.txt.gz” was provided by eagle.

It didn’t go through. The error I got was:

[W::vcf_parse] contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
ERROR: Multi-allelic site found (i.e., ALT contains multiple alleles)
       Either drop or split (bcftools norm -m) multi-allelic variants

Not sure what it exactly means. “index the file with tabix”, index which file? It cannot be the “genetic_map_hg19_withX.txt.gz” file right? So I tried to “tabix myprofile.vcf”, then I got the following error:

Not a BGZF file: data/genome_3j.vcf
tbx_index_build failed: data/genome_3j.vcf

Up to this point, I think the error is reckless. Probably I’ve done something terribly wrong.

Can someone please help? Either with eagle/fabix or someother workaround.

I just want to impute some SNPs in this very popular and easy format, can’t someone write a program to just take such a file as input and a couple of options to point to needed reference SNP database and/or genome sequences? Actually, someone has already done that – Michigan Imputation Server, but you need to register an account, upload your data to their server, and download results there. This is awesome and the way to go in terms of the simplicity to use, but you cannot pipeline in the server.

The “manuals” or “READMEs” or “instructions” are not good enough for me.

Thanks for any instructions.

snp software error genome • 2.3k views
ADD COMMENTlink modified 9 months ago by isidrolauscher0 • written 2.8 years ago by moxu440
0
gravatar for geek_y
2.8 years ago by
geek_y9.7k
Barcelona/CRG/London/Imperial
geek_y9.7k wrote:

You need to bgzip your VCF file and then index it using tabix.

bgzip myprofile.vcf
tabix myprofile.vcf.gz

or add the required information to the header.

##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
...
...
...

Regarding multi alleles, as said, either remove them ( using vcftools etc ) or split them using bcftools. This should be done before you index your VCF.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by geek_y9.7k

Great! After doing the bgzip on the .vcf, it went much further. I ran

eagle --vcf myprofile.vcf.gz  --geneticMapFile genetic_map_hg19_withX.txt.gz --outPrefix /tmp/t --chrom 21

and got the following error:

Phasing samples 1-1
WARNING: Sample 1 (1-indexed) has a het count of 0
ERROR: Failed to allocate 18446744073709551596 bytes

Not sure why eagle requires this much -- I guess -- memory.

It didn't complain about multi alleles this time, although I have not taken care of the multi allele problem yet because I don't know how. Could you please give me the bcftools command line to run the splitting? I don't see a split command/option or something that might be related.

Thanks much!

ADD REPLYlink modified 2.8 years ago by geek_y9.7k • written 2.8 years ago by moxu440

This may have been answered, but may I please check if there's a solution to the memory issue? I am having the same problem whilst trying to run Eagle2. Thanks so much.

ADD REPLYlink written 21 months ago by Noluthando0

sorry, I found the problem!

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by fatima10
0
gravatar for isidrolauscher
9 months ago by
isidrolauscher0 wrote:

Any solution/suggestion to the memory problem? Thanks.

ADD COMMENTlink written 9 months ago by isidrolauscher0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour