Calculation of VAF (variant allele frequency)
1
5
Entering edit mode
3.4 years ago
JJ ▴ 610

Hi all,

So, I am interested in computing the VAF (variant allele frequency) by extracting values from a VCF file. From my understanding, the VAF is calculated as follows: AD(second entry)/DP so e.g.

AD = 8,4
DP = 12
VAF = 4/12

DP = 15
VAF = 10/15


Is this correct?

However, I've seen that sometimes the frequency of the most frequent (or less frequent) allele is computed. Hence it would be 8/12 and 10/15 (or 4/12 and 5/15 for the less frequent allele).

Or is this the difference between the AF (allele frequency), MAF (minor allele frequency) and VAF (variant allele frequency)? (also see MAF vs VAF on this topic)

And what about the 1/2 SNPs? do I have two values for VAF then?

AD = 0,5,10
DP = 15
VAF1 = 5/15
VAF2 = 10/15


I am slightly confused here. Thanks for your input!

sequencing • 14k views
7
Entering edit mode
3.4 years ago

MAF = minor allele frequency, is the frequency of these allele in the POPULATION.

you have a bunch of samples genotyped and then:

1- you calculate the frequency of one of the alleles (usually the non-reference allele) for a given variant:

 freq(a) =( sum(samples_with_geno_aa x 2) + sum(samples_with_geno_Aa)) / (samples x 2)


2- freq(A) = 1-freq(a)

3- now comes the practical and problematic issue of Minor. Not necessarily, the nonreference allele is the less frequent. Or, if it is in your population could not be the allele with less frequency in other population. Or if freq(a) is 0.49, it could be that next bunch of samples for this SNP the freq(a)=0.51 and then the MAF allele is A instead of a.

So, always that you calculate a MAF you need to explicitely to tell the MAF_allele for this genotype. Don't expect to be the nonreference one.

VAF The concept of VAF, variant allele FRACTION (I prefer to use "fraction" to "frequency" as I come from population genetics background and I use "frequency" for population, no reads sampling.)

VAF is used mainly in two scenarios

• germline genotyping: in diploid organism the VAF helps to find if all went well with the genotype calling. the VAF of a locus with a deph>80 should be near 50%. With depth [30-40] the VAF of a het could vary between 0.35-0.65

a VAF around 0.25 could mean that there is another copy of this region in the genome, one is "AA" and another "Aa" (a 25% of "a")

• cancer genotyping In cancer, in a sample, you have a mix of samples each with its own mutations. Here we use the VAF as a proxy of how many cells do we have for each cancer cell lineage (this is like the MAF for the cancer cells in the sequenced material).

0
Entering edit mode

Now it's clear to me what the difference between MAF and VAF is and I am looking into the variant allele fraction and not MAF. I have a couple of cancer samples (actually cancer cell lines) and I want to look at the heterogeneity within each sample. I expect a very high heterogeneity of course. Even more so, as the ploidy can vary from cell to cell.

I have three questions remaining: 1) relating to germline genotyping - I wonder what could have all gone wrong? and how can I check that for cancer samples where high heterogeneity is expected? Moreover, I have a lower depth [10-30] - what range for het could be expected here?

2) since it's called the VARIANT allele fraction, I assume it's always the AD(second entry)/DP as I stated above. Correct?

3) So what about the 1/2 SNPs? do I have two values for VAF then?

Again, thanks so much!!

0
Entering edit mode

Hi there, do you have any recommended material for understanding variant allele frequency and its calculation? Thank you!