I have a VCF with the following lines:
##fileformat=VCFv4.2 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=AD,Number=G,Type=Integer,Description="Allelic Depths of REF and ALT(s) in the order listed"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL 1 15557977 . TG CA . . . GT:AD:DP 0/1:11,5:16 0/0:21,1:22 1 146728217 . G A . . . GT:AD:DP 0/1:19,21:40 0/0:42,0:42
I am under the impression that to calculate the minor allele frequency (AF), I need to divide AD by DP. I need clarification for this specific calculation since the AD attribute has two comma-separated values. Are the two comma-seperated values indicating the major and minor alleles? Does that mean for the calculation of the minor AF that I only care about the smaller of the two numbers?
Looking at the first line, under the tumor column: AD = 11,5 & DP = 16. Would it be 5/16 = 0.3125?
This is what I am thinking, but I was having trouble finding distinct confirmation in my searches.
Additionally, sometimes VCF files do not have multiple AD values -- does that mean to calculate the minor AF that I just use the single AD value? Or do I need to subtract the provided AD value from the DP value and then take the smaller value of those two (AD, DP - AD), to calculate it?
Edit for @2nelly -- Example:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT TUMOR NORMAL chr1 2993807 . C G . PASS AC=4;ADP=211;AN=4;HET=0;HOM=1;NC=0;SF=0,1;WT=0 GT:RDF:DP:ADF:ABQ:FA:RBQ:GQ:ADR:PVAL:AD:RDR:RD:SDP:FREQ 1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05% 1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05%
In this line, under the tumor column: AD = 209, DP = 211. To calculate the minor AF, I assume it would actually be 211-209 = 2 for the minor AD, and then 2/211 = 0.0095 ?
Thank you in advance for clarifying this for me!