Question: Beagle imputation results quality control
gravatar for eyb
5.6 years ago by
Russian Federation
eyb190 wrote:

My region of interest is ~ 120 kb. I have 300 samples, all containing 14 SNPs in the region. I tried to impute using beagle to get more SNPs. I used CEU population (the closest one) as a reference. After filtering CEU samples I got about 350 SNPs per sample to use a reference.

After imputation I got the same number of SNPs in my population as in CEU. Is it legit to use them all? I have a gut feeling that I have to filter them according to the imputation quality or something. How do I do that? The output VCF looks something like this:

1       110187031       rs113581509     C       T       .       PASS    AR2=0.468;DR2=0.514;AF=0.06     GT:DS:GP        0|1:0.759:0.242,0.758,0 0|0:0.018:0.982,0.018,0 0|0:0.018:0.982,0.018,0 0|0:0.001:0.999,0.001,0

Can anyone give me a clue on how to filter the results? Or maybe I should use another software?

beagle imputation vcf • 4.2k views
ADD COMMENTlink modified 11 months ago by zx87549.9k • written 5.6 years ago by eyb190

Hi eyb,

I know it's been several years ago, but right now I'm facing the same problem that you had in that moment: I've just achieved to impute my data with Beagle, but now I would like to know how to filter out the bad quality SNPs.

I suspect that it is related with the DR2 field, but I'm not quite sure about it... Did you finally resolve your problem?? Thank you very much in advanced!

ADD REPLYlink written 2.0 years ago by biosol150
gravatar for Kevin Blighe
22 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

There is no consensus on how best to filter the post-imputation results. You can use a combination of AR2 (allelic R-squared), DR2 (dosage R-squared), and MAF. Take a look at this pre-print and subsequent publication, where they actually did not do any filtering post-imputation:

Very low depth whole genome sequencing in complex trait association studies (peer reviewed publication).

Variant-level QC

Beagle provides two position level imputation metrics, allelic R-squared and dosage R-squared. Both measures are highly correlated (Supplementary Fig. S8a). Values between 0.3 and 0.8 are typically used for filtering (Brian Browning, personal communication).


ADD COMMENTlink written 22 months ago by Kevin Blighe69k

Great, thanks, I'll have that in mind!

ADD REPLYlink written 22 months ago by biosol150

Update: Beagle 5.0 only have DR2.

ADD REPLYlink written 19 months ago by Shicheng Guo8.5k

Hi Kevin,

java -Xmx32g -jar beagle.16May19.351.jar impute=false gt=Exome.vcf out=Exome.vcf.phasing

Like above example. Suppose, we don't use map and reference parameters, what's the accuracy of the phasing for beagle?


ADD REPLYlink written 19 months ago by Shicheng Guo8.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2187 users visited in the last hour