Question: Calculation of VAF (variant allele frequency)
3
JJ520 wrote:

Hi all,

So, I am interested in computing the VAF (variant allele frequency) by extracting values from a VCF file. From my understanding, the VAF is calculated as follows: AD(second entry)/DP so e.g.

``````AD = 8,4
DP = 12
VAF = 4/12

DP = 15
VAF = 10/15
``````

Is this correct?

However, I've seen that sometimes the frequency of the most frequent (or less frequent) allele is computed. Hence it would be 8/12 and 10/15 (or 4/12 and 5/15 for the less frequent allele).

Or is this the difference between the AF (allele frequency), MAF (minor allele frequency) and VAF (variant allele frequency)? (also see MAF vs VAF on this topic)

And what about the 1/2 SNPs? do I have two values for VAF then?

``````AD = 0,5,10
DP = 15
VAF1 = 5/15
VAF2 = 10/15
``````

I am slightly confused here. Thanks for your input!

sequencing • 6.7k views
modified 19 months ago by Pablo Marin-Garcia1.8k • written 19 months ago by JJ520
3
Pablo Marin-Garcia1.8k wrote:

MAF = minor allele frequency, is the frequency of these allele in the POPULATION.

you have a bunch of samples genotyped and then:

1- you calculate the frequency of one of the alleles (usually the non-reference allele) for a given variant:

`````` freq(a) =( sum(samples_with_geno_aa x 2) + sum(samples_with_geno_Aa)) / (samples x 2)
``````

2- `freq(A) = 1-freq(a)`

3- now comes the practical and problematic issue of Minor. Not necessarily, the nonreference allele is the less frequent. Or, if it is in your population could not be the allele with less frequency in other population. Or if freq(a) is 0.49, it could be that next bunch of samples for this SNP the freq(a)=0.51 and then the MAF allele is A instead of a.

So, always that you calculate a MAF you need to explicitely to tell the MAF_allele for this genotype. Don't expect to be the nonreference one.

VAF The concept of VAF, variant allele FRACTION (I prefer to use "fraction" to "frequency" as I come from population genetics background and I use "frequency" for population, no reads sampling.)

VAF is used mainly in two scenarios

• germline genotyping: in diploid organism the VAF helps to find if all went well with the genotype calling. the VAF of a locus with a deph>80 should be near 50%. With depth [30-40] the VAF of a het could vary between 0.35-0.65

a VAF around 0.25 could mean that there is another copy of this region in the genome, one is "AA" and another "Aa" (a 25% of "a")

• cancer genotyping In cancer, in a sample, you have a mix of samples each with its own mutations. Here we use the VAF as a proxy of how many cells do we have for each cancer cell lineage (this is like the MAF for the cancer cells in the sequenced material).

Now it's clear to me what the difference between MAF and VAF is and I am looking into the variant allele fraction and not MAF. I have a couple of cancer samples (actually cancer cell lines) and I want to look at the heterogeneity within each sample. I expect a very high heterogeneity of course. Even more so, as the ploidy can vary from cell to cell.

I have three questions remaining: 1) relating to germline genotyping - I wonder what could have all gone wrong? and how can I check that for cancer samples where high heterogeneity is expected? Moreover, I have a lower depth [10-30] - what range for het could be expected here?

2) since it's called the VARIANT allele fraction, I assume it's always the AD(second entry)/DP as I stated above. Correct?

3) So what about the 1/2 SNPs? do I have two values for VAF then?

Again, thanks so much!!