Speeding up Eagle phasing and imputation
11 months ago
PeterKW ▴ 90

Hi,

I am doing imputation using Eagle and it is quite slow. The program gives me a Warning;

> WARNING: --vcfRef does not end in '.bcf'; BCF input is fastest


The command line I am using is;

eagle --vcfRef ref.vcf.gz --vcfTarget target.vcf.gz --geneticMapFile=genetic_map_1cMperMb.txt --chromX=30 --pbwtIters=10 --numThreads=30 --outPrefix=eagle_target_1 --chrom=1

I tried converting my VCF file to BCF format using;

bcftools view target.vcf -Oz -o target.bcf.gz

and

bcftools index -f target.bcf.gz

but I still get the same warning "--vcfRef does not end in '.bcf'; BCF input is fastest".

Could you please help on where I could be going wrong, I wish to use BCF format if indeed its faster and then later convert to back to VCF. I will also appreciate a code to convert back to VCF from BCF.

Thanks

11 months ago
4galaxy77 2.3k

bcftools view target.vcf -Oz -o target.bcf.gz - this produces a file which is zipped using gzip rather than bgzip

bcftools view target.vcf -Ob -o target.bcf.gz

Having said that, using a .bcf only speeds up reading in the file (I think), so I doubt it will make a massive difference overall. If you want to speed it up, use fewer PBWT iterations at the cost of accuracy. Also, it's possible you are using too many threads.

Thanks @4galaxy77, this is very useful feedback.