allele frequency in VCF
2
1
Entering edit mode
11 months ago
af15d938 ▴ 10

I am new to genomics, and have a question about what allele frequency in a vcf exactly refers to. I have a bam file that represents sequencing done on one patient, and I want to understand the composition of SNPs (i.e. if the reference is A, and the SNP is G, are 40% of the reads calling G, or 90%) is this what allele frequency is describing? Or is it saying that in a population, if the allele frequency is .1, 10% of the population had the alternate allele? I promise I have tried googling this, but have not yet gotten an answer that completely makes sense.

vcf bam allele-frequency • 1.8k views
ADD COMMENT
2
Entering edit mode
11 months ago
LauferVA 4.2k

First place to look is the VCF / BCF file format specifications, which can be found at samtools.github.io, specifically here. Page 9 has a description of many fields, including:

AF {is a object of type} Float {representing} Allele frequency for each ALT allele in the same order as listed (estimated from primary data, not called genotypes)

To see if you can get more specific information, I'd look at the header of your VCF file next, and pull relevant information from it. At minimum, it should have a line like this:

##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">

However, depending on how the VCF was created, it may have more info than that. For instance, for the for the Allele ID field, here is an example that includes additional info. about the source and version of the Db used to assign the fields:

##INFO=<ID=ALLELEID,Number=A,Type=String,Description="Allele ID",Source="ClinVar",Version="20220804">

Does your VCF header include any helpful info. in this regard? Good luck / let us know.

ADD COMMENT
2
Entering edit mode
11 months ago
Vic ▴ 100

I think maybe a basic explanation of allele frequencies would be useful. It’s a way of estimating how many times the allele appears in the population, in a single person (diploid organism) frequencies can only be 1, 0.5 or 0 (they can have two ref, one ref one alt or two alt). As frequencies that’s heterozygous (one copy of each allele) 0.5 and 0.5, or homozygous, 1 and 0. If you are just trying to estimate for a diploid individual calls it simple in the sense that they are either homozygous or heterozygous. An individual human only has two chromosomes.

Yes, your population example is correct and allele frequencies are usually used in population level analysis.

I think can use vcftools to estimate allele frequencies across all individuals in the vcf file and all sites using the --freq command which “Outputs the allele frequency for each site in a file with the suffix ".frq"”:

vcftools --vcf my_file.vcf --freq --out my_freq_out_file 

In a population imagine I sampled 3 individuals (diploid). Those 3 individuals are in my vcf. The individuals comprised of two homo people (so 4 reference chromosomes across the two individuals) and one hetero person (one reference, one alt). Well your allele frequencies at the population level would be 5 ref, 1 alt. 5/6 = 0.833 and 1/6=0.166.

The freq output might look something look like this:

CHROM    POS     N_ALLELES   N_CHR   {ALLELE:FREQ}
CHROM1  12       2           6       T:0.83333  C:0.16666

Hope that helps.

ADD COMMENT

Login before adding your answer.

Traffic: 2282 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6