Question: How to calculate Imputation Accuracy Estimates like concordance with BEAGLE?
gravatar for Shab86
4.7 years ago by
Shab86270 wrote:

Hi all,

I have an imputed file output from BEAGLE IMPUTED FILE Rows are SNP's and columns are individuals. Now I would like to calculate some imputation accuracy estimates like concordance or Rsq and then plot them across MAF(minor allele frequency). How do I calculate them? Are there any tools which could generate such statistics?

Any help is highly appreciated.

sequencing snp impute R genome • 3.4k views
ADD COMMENTlink modified 3.1 years ago by vskale13510 • written 4.7 years ago by Shab86270
gravatar for vskale135
3.1 years ago by
United Kingdom
vskale13510 wrote:

Hello All,

I would also like to determine imputation accuracy in our GBS dataset. Here is what I did:

1) randomly selected 1% SNPs: zcat all.vcf.gz | awk '$1~/^#/ || rand()<=0.01' | bgzip -c > eval.vcf.gz 2) exclude the evaluation sites from the original VCF : bcftools isec -C all.vcf.gz eval.vcf.gz -Oz > impute.vcf.gz 3) imputed the missing data using beagleV4 : java -Xmx100g -jar beagleV4.1.jar gt=impute.vcf.gz out=imputed window=100 overlap=30 niterations=10 when I compard the imputed.vcf.gz and eval.vcf.gz using vcf-compare, I got followign output:

SN Number of REF matches: 0 SN Number of ALT matches: 0 SN Number of REF mismatches: 0 SN Number of ALT mismatches: 0 SN Number of samples in GT comparison: 0

I request you to please help.

Thanking you with best regards


ADD COMMENTlink written 3.1 years ago by vskale13510

To estimate the quality of imputation, I think, imputed.vcf should be compared with all.vcf.gz and not with eval.vcf.gz :)

ADD REPLYlink written 2.9 years ago by Gennady Khvorykh90
gravatar for Zev.Kronenberg
4.7 years ago by
United States
Zev.Kronenberg11k wrote:

Three step process.

  1. Drop some % of your genotype calls.

  2. Impute

  3. Measure VCF concordance of original and imputed VCF file.

I've done this. You should play with the % of genotypes you remove and MAF.

ADD COMMENTlink written 4.7 years ago by Zev.Kronenberg11k

Thanks Zev for your reply. I have already done step 1 where I removed bad quality calls, and then masked the genotyped file which then I used in BEAGLE for imputation. Now I have the imputed file and the original one , and from these files I would like to get those accuracy estimates like concordance. The imputed file is the one I had attached in the original post. Any idea with how to get those estimates?

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by Shab86270

To test the accuracy you remove high quality variant calls. Not low quality calls.

ADD REPLYlink written 4.7 years ago by Zev.Kronenberg11k

I have done the steps 1 & 2 and my main query is about no. 3. Are there any tools etc which I can use to get step 3?

ADD REPLYlink written 4.6 years ago by Shab86270

You can use vcf-compare or bcftools stats to get stats which you can plot using plot-vcfstats. Can you please let me know, how you performed the step1 and step2. I don't have reference panel.

Thanks and regards


ADD REPLYlink written 3.1 years ago by vskale13510

I'm trying to do the same thing..just that my data is multi allelic. How do we calculate imputation accuracy for multi allelic data? Ill appreciate any help on that.Thanks!

ADD REPLYlink written 3.6 years ago by akang90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1639 users visited in the last hour