Calculating Variant Allele Frequency
1
2
Entering edit mode
6.0 years ago
novice ★ 1.1k

I'm reading an interesting paper, Malachi et al., Cell Systems 2015, that talks about Variant Allele Frequency (VAF). Could someone please help me understand how they calculated this value? I couldn't find it in the methods.

edit

For example, let's say I have a bunch of VCF files that I would like to find VAF for. How would I go about doing that? Here's an example of a VCF file:

##fileformat=VCFv4.1
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AB,Number=1,Type=Float,Description="Allele Balance of Alt Allele">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=SAP,Number=1,Type=Float,Description="Strand Bias Probability of the Alt Allele">
##INFO=<ID=RAP,Number=1,Type=Float,Description="Strand Bias Probability of the Ref Allele">
##INFO=<ID=DP4,Number=4,Type=Float,Description="Fwd Strand Ref Counter, Rev Strand Ref Counter, Fwd Strand Alt Counter, Rev Strand Alt Counter">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  genotype
chr1    808922  .   G   A   249 PASS    DP=49;AB=1.0000;RD=0;AD=49;SAP=0.5510;RAP=0;DP4=0,0,27,22   GT:GQ:DP    1/1:145:49
chr1    808928  .   C   T   249 PASS    DP=51;AB=1.0000;RD=0;AD=51;SAP=0.5294;RAP=0;DP4=0,0,27,24   GT:GQ:DP    1/1:145:51
chr1    876499  .   A   G   217 PASS    DP=49;AB=1.0000;RD=0;AD=49;SAP=0.6939;RAP=0;DP4=0,0,34,15   GT:GQ:DP    1/1:126:49

vaf • 20k views
4
Entering edit mode
6.0 years ago
Noushin N ▴ 600

Variant allele frequency in this case refers to the fraction of sequencing reads overlapping a genomic coordinate that support the non-reference (mutant/alternate) allele.

Typically, this information is either explicitly listed or readily extractable from VCF files. If not, given that you have the genomic coordinates of the mutation of interest and the bam file, you can run samtools mpileup command to get the alleles coming from the entire set of overlapping reads.

I use a python script to parse the output of mpileup into a human-readable allelic table.

0
Entering edit mode

Thanks for the reply. Would you be able to provide an example pipeline for finding VAF from a VCF file? (please see edit)

2
Entering edit mode

The referenced scripts parses allele counts from an mpileup format file. From the VCF snippet you attached, in the header you can see:

> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
> ##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">


The variant allele frequency (assuming that the germline is homozygous reference) is

VAF = AD / DP = Depth of Alt Allele / Total Depth

0
Entering edit mode

But what do you mean by "assuming the germline is homozygous reference"?

1
Entering edit mode

By that I mean that in the germline (normal cell), the individual has two copies of the reference allele. By mentioning this, I am trying to exclude cases where the germline is heterozygous, and one the reference allele is lost by somatic copy number events (LOH).

0
Entering edit mode

baseParser.py seems not working these days. are you still maintaining it?

0
Entering edit mode

It works - just needs running in python2. and Thanks Noushin, very useful!