Manually calculate Minor Allele Frequency in this VCF
0
2
Entering edit mode
4.8 years ago
cookersjs ▴ 30

I have a VCF with the following lines:

##fileformat=VCFv4.2
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=G,Type=Integer,Description="Allelic Depths of REF and ALT(s) in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR   NORMAL
1       15557977        .       TG      CA      .       .       .       GT:AD:DP        0/1:11,5:16     0/0:21,1:22
1       146728217       .       G       A       .       .       .       GT:AD:DP        0/1:19,21:40    0/0:42,0:42

I am under the impression that to calculate the minor allele frequency (AF), I need to divide AD by DP. I need clarification for this specific calculation since the AD attribute has two comma-separated values. Are the two comma-seperated values indicating the major and minor alleles? Does that mean for the calculation of the minor AF that I only care about the smaller of the two numbers?

Looking at the first line, under the tumor column: AD = 11,5 & DP = 16. Would it be 5/16 = 0.3125?

This is what I am thinking, but I was having trouble finding distinct confirmation in my searches.

Additionally, sometimes VCF files do not have multiple AD values -- does that mean to calculate the minor AF that I just use the single AD value? Or do I need to subtract the provided AD value from the DP value and then take the smaller value of those two (AD, DP - AD), to calculate it?

Edit for @2nelly -- Example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  TUMOR      NORMAL
chr1    2993807 .       C       G       .       PASS    AC=4;ADP=211;AN=4;HET=0;HOM=1;NC=0;SF=0,1;WT=0  GT:RDF:DP:ADF:ABQ:FA:RBQ:GQ:ADR:PVAL:AD:RDR:RD:SDP:FREQ 1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05%        1/1:0:211:178:52:0.9905:36:255:31:2.0028E-123:209:1:1:211:99.05%

In this line, under the tumor column: AD = 209, DP = 211. To calculate the minor AF, I assume it would actually be 211-209 = 2 for the minor AD, and then 2/211 = 0.0095 ?

Thank you in advance for clarifying this for me!

vcf minor allele frequency cancer • 3.3k views
ADD COMMENT
1
Entering edit mode

Dear cookersjs,

You are right about the division

16 is the total depth in tumor

11 is the depth of REF in tumor

5 is the depth of ALT in tumor

Regarding the last part of your question, it is better to upload an example of the whole line. Maybe these variants should be filtered out. How did you produce the vcf file? You should normally get more info than these.

ADD REPLY
0
Entering edit mode

Thanks for confirming! I have added a vcf sample that illustrates the second case I was describing

ADD REPLY
0
Entering edit mode

For sure is coming from another vcf file. something is wrong with this line. Control and case samples are homozygous with the same values!!!! How did you call these variants? Can you please post the header of vcf?

ADD REPLY
0
Entering edit mode

The file was provided to me, I'm not sure how it was generated. That at least clears up that the file was the problem here, thanks!

ADD REPLY
0
Entering edit mode

Hello cookersjs ,

could you please explain you definition of "minor allel frequency"? My understanding is, that this is the frequency of the second most allele in a given population (and this can be the reference allele as well!).

What you are calculating in your example is the fraction of reads supporting the alternate allele. As described in the header, the first value in the AD field are the reads supporting the REF allele and second the one in the ALT column.

fin swimmer

ADD REPLY
0
Entering edit mode

Hi finswimmer,

My understanding of minor allele frequency is that it is the frequency at which the alternate allele occurs in the sample. Based on the replies from 2nelly and from my own interpretation, in the AD (allelic depth) attribute there are two comma-seperated values.

The sum of those two values is equal to the single DP value. In the first example, the DP was 16, and the ADs for ref and alt were 11 and 5, respectively. Since I am interested in the "minor" allele frequency, that would mean the smaller AD value is the one I am interested in. I can get the frequency by dividing the AD(minor) by DP, or 5/16 = 0.3125.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6