is it possible to calculate MAF from DS instead of GT when using bcftools +fill-tags?
0
0
Entering edit mode
20 months ago
curious ▴ 750

I need to update the INFO col of a VCF include MAF based on the DS.

Usually I would do bcftools +fill-tags my_vcf.gz, but this behavior seems to update MAF working from the GT

From what I understand from the fill-tags code, it looks like GT is only referenced, so this makes sense.

Any suggestions for how I could update MAF based on DS (preferably from an existing tool)

bcftools • 804 views
ADD COMMENT
0
Entering edit mode

what is the definition of DS ?

ADD REPLY
0
Entering edit mode

dosage from imputation software minimac:

Minimac3 estimates imputed dosage at an haplotype level by finding the posterior probability of the alternate allele at that site. The genotype dosage is next evaluated as the sum of the haplotype dosages of each haplotype. For e.g. if the estimated posterior probability of the alternate allele is 0.98 and 0.96 in each haplotype, the genotype dosage is output as 0.98 + 0.97 = 1.95.

here is an example (including format and header tag):

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">

GT:DS 0|0:0.009

ADD REPLY
0
Entering edit mode

I spent some more time on this w/ awk, these are the updated MAF values based on dosage, still not sure how I could update the VCF with them

  num_samples=$(bcftools query -l my_vcf.gz |  wc -l);
 bcftools query -f '[%DS\t]\n' my_vcf.gz | awk '{for(i=1;i<=NF;i++) t+=$i; print t; t=0}' | awk -v num_samples=$num_samples '{print $1 / (num_samples * 2)}'  | awk 'NR>1 && $1>.5 {$1 = 1-$1} {print}' 
ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6