Question: Manually calculate Minor Allele Frequency in this VCF
0
gravatar for cookersjs
6 months ago by
cookersjs10
Toronto, ON
cookersjs10 wrote:

I have a VCF with the following lines:

##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=G,Type=Integer,Description="Allelic Depths of REF and ALT(s) in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR   NORMAL
1       15557977        .       TG      CA      .       .       .       GT:AD:DP        0/1:11,5:16     0/0:21,1:22
1       146728217       .       G       A       .       .       .       GT:AD:DP        0/1:19,21:40    0/0:42,0:42

I am under the impression that to calculate the minor allele frequency (AF), I need to divide AD by DP. I need clarification for this specific calculation since the AD attribute has two comma-separated values. Are the two comma-seperated values indicating the major and minor alleles? Does that mean for the calculation of the minor AF that I only care about the smaller of the two numbers?

Looking at the first line, under the tumor column: AD = 11,5 & DP = 16. Would it be 5/16 = 0.3125?

This is what I am thinking, but I was having trouble finding distinct confirmation in my searches.

Additionally, sometimes VCF files do not have multiple AD values -- does that mean to calculate the minor AF that I just use the single AD value? Or do I need to subtract the provided AD value from the DP value and then take the smaller value of those two (AD, DP - AD), to calculate it?

Edit for @2nelly -- Example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR      NORMAL
chr1    2993807 .       C       G       .       PASS    AC=4;ADP=211;AN=4;HET=0;HOM=1;NC=0;SF=0,1;WT=0  GT:RDF:DP:ADF:ABQ:FA:RBQ:GQ:ADR:PVAL:AD:RDR:RD:SDP:FREQ 1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05%        1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05%

In this line, under the tumor column: AD = 209, DP = 211. To calculate the minor AF, I assume it would actually be 211-209 = 2 for the minor AD, and then 2/211 = 0.0095 ?

Thank you in advance for clarifying this for me!

ADD COMMENTlink modified 6 months ago • written 6 months ago by cookersjs10
1

Dear cookersjs,

You are right about the division

16 is the total depth in tumor

11 is the depth of REF in tumor

5 is the depth of ALT in tumor

Regarding the last part of your question, it is better to upload an example of the whole line. Maybe these variants should be filtered out. How did you produce the vcf file? You should normally get more info than these.

ADD REPLYlink modified 6 months ago • written 6 months ago by 2nelly170

Thanks for confirming! I have added a vcf sample that illustrates the second case I was describing

ADD REPLYlink written 6 months ago by cookersjs10

For sure is coming from another vcf file. something is wrong with this line. Control and case samples are homozygous with the same values!!!! How did you call these variants? Can you please post the header of vcf?

ADD REPLYlink modified 6 months ago • written 6 months ago by 2nelly170

The file was provided to me, I'm not sure how it was generated. That at least clears up that the file was the problem here, thanks!

ADD REPLYlink written 6 months ago by cookersjs10

Hello cookersjs ,

could you please explain you definition of "minor allel frequency"? My understanding is, that this is the frequency of the second most allele in a given population (and this can be the reference allele as well!).

What you are calculating in your example is the fraction of reads supporting the alternate allele. As described in the header, the first value in the AD field are the reads supporting the REF allele and second the one in the ALT column.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer13k

Hi finswimmer,

My understanding of minor allele frequency is that it is the frequency at which the alternate allele occurs in the sample. Based on the replies from 2nelly and from my own interpretation, in the AD (allelic depth) attribute there are two comma-seperated values.

The sum of those two values is equal to the single DP value. In the first example, the DP was 16, and the ADs for ref and alt were 11 and 5, respectively. Since I am interested in the "minor" allele frequency, that would mean the smaller AD value is the one I am interested in. I can get the frequency by dividing the AD(minor) by DP, or 5/16 = 0.3125.

ADD REPLYlink written 6 months ago by cookersjs10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 784 users visited in the last hour