Correct way to calculate VAF (Variant allele fraction) from a VCF file
0
0
Entering edit mode
21 months ago
prasundutta87 ▴ 640

Hi,

Generally, in cancer variation studies, the variant allele fraction (VAF) is calculated using this formula: alt reads/total reads at the loci.

In a VCF file, the FORMAT/AD tag has two values, for ex., 43,45 where the numbers represent allelic depths for the ref and alt alleles for a sample in the order listed.

The FORMAT field also has the DP tag which is the total depth. The difference between AD and DP in short is that a ref or alt supporting reads gets counted towards AD only if it is informative. Whereas, in DP both informative and uninformative reads gets counted. More inf can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/360035532252-Allele-Depth-AD-is-lower-than-expected

My question iif it is possible that the total of alt reads and ref reads from the AD tag may not match the DP tag, what should be the best way to calculate VAF. Can I only focus on the AD tag and fo this calculation: alt reads/ref reads+alt reads, OR, is it okay to do this calculation- alt reads/DP

Regards. Prasun

VCF snp • 4.5k views
1
Entering edit mode

Since the AD value reflects how many reads actually contributed support for a given allele at the site, I would only focus on the AD tag. However, it can be complicated because both DP and GT may differ from the VAF. Another problem is the visualization in the IGV that may differ from the calculated VAF using the AD values. It would probably be a good idea to talk to people who would use the data about this possible difference.

0
Entering edit mode

Thanks! I am actually going to do that now.

0
Entering edit mode

There was a similar question previously: VCF AF and %Freq

0
Entering edit mode

Hi,

Thanks for this. But, I don't think it answers my question. INFO/AF just gives information of the number of alternate alleles at site. For example, it can be 0.5 for a loci having a heterozygous alternate genotype (0/1) in a single sample VCF. VAF calculation on the other hand takes the number of reads supporting an alternate and reference allele into consideration.

0
Entering edit mode

VAF and AF can both refer to allele frequency. The resources in the other thread are specifically related to your original question.

0
Entering edit mode

Thanks Igor. I understand. Unfortunetly, the first two links are not opening (where I believe my answer lies). Let me check in the gatk website itself. They may have changed the link. Thanks a lot again!

0
Entering edit mode

Hi Prasun. Did you find a way to determine VAF from VCF files? I have to find out Allelic Fractions from my WES data (BAM/VCF) and I don't know how to do it. :(

1
Entering edit mode

Hi,

If its a multisample VCF file, you need to use the correct tags to calculate it on a per sample basis. You can get an idea from here: https://github.com/samtools/bcftools/issues/1731

For both cases, try getting the latest version because the second one won't work unless you have at least bcftools v1.15.

0
Entering edit mode

Thank you for getting back!

0
Entering edit mode

Hi Prasun. So I called my somatic variants using GATK Mutect2. I have 4 individual sample VCF files which I generated using the Tumor-matched normal mode (Tumor+Blood DNA for 4 samples). There are a number of terminal command lines in the link you shared. Which one should I use? Also, would it work for VCF files generated from Mutect2? Thanks for the help.

2
Entering edit mode

I am sorry, I am not very conversant with Mutect2 output, but it should be a regular VCF file. But if its a single sample VCF file, you can use this command:

bcftools +fill-tags test/fill-tags-VAF.vcf -- -t VAF


Traffic: 1384 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.