Question: Beagle imputation results quality control
1
gravatar for eyb
4.4 years ago by
eyb180
Russian Federation
eyb180 wrote:

My region of interest is ~ 120 kb. I have 300 samples, all containing 14 SNPs in the region. I tried to impute using beagle to get more SNPs. I used CEU population (the closest one) as a reference. After filtering CEU samples I got about 350 SNPs per sample to use a reference.

After imputation I got the same number of SNPs in my population as in CEU. Is it legit to use them all? I have a gut feeling that I have to filter them according to the imputation quality or something. How do I do that? The output VCF looks something like this:

1       110187031       rs113581509     C       T       .       PASS    AR2=0.468;DR2=0.514;AF=0.06     GT:DS:GP        0|1:0.759:0.242,0.758,0 0|0:0.018:0.982,0.018,0 0|0:0.018:0.982,0.018,0 0|0:0.001:0.999,0.001,0

Can anyone give me a clue on how to filter the results? Or maybe I should use another software?

beagle imputation vcf • 2.6k views
ADD COMMENTlink modified 8 months ago by Kevin Blighe51k • written 4.4 years ago by eyb180

Hi eyb,

I know it's been several years ago, but right now I'm facing the same problem that you had in that moment: I've just achieved to impute my data with Beagle, but now I would like to know how to filter out the bad quality SNPs.

I suspect that it is related with the DR2 field, but I'm not quite sure about it... Did you finally resolve your problem?? Thank you very much in advanced!

ADD REPLYlink written 10 months ago by sonia.olaechea100
3
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

There is no consensus on how best to filter the post-imputation results. You can use a combination of AR2 (allelic R-squared), DR2 (dosage R-squared), and MAF. Take a look at this pre-print and subsequent publication, where they actually did not do any filtering post-imputation:

Very low depth whole genome sequencing in complex trait association studies (peer reviewed publication).

Variant-level QC

Beagle provides two position level imputation metrics, allelic R-squared and dosage R-squared. Both measures are highly correlated (Supplementary Fig. S8a). Values between 0.3 and 0.8 are typically used for filtering (Brian Browning, personal communication).

Kevin

ADD COMMENTlink written 8 months ago by Kevin Blighe51k
1

Great, thanks, I'll have that in mind!

ADD REPLYlink written 8 months ago by sonia.olaechea100

Update: Beagle 5.0 only have DR2.

ADD REPLYlink written 5 months ago by Shicheng Guo7.8k

Hi Kevin,

java -Djava.io.tmpdir=./temp/ -Xmx32g -jar beagle.16May19.351.jar impute=false gt=Exome.vcf out=Exome.vcf.phasing

Like above example. Suppose, we don't use map and reference parameters, what's the accuracy of the phasing for beagle?

Thanks.

ADD REPLYlink written 5 months ago by Shicheng Guo7.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1756 users visited in the last hour