Summarising Whole Genome Allele Frequency Spectrum from VCF file
2
3
Entering edit mode
6.9 years ago
Rubal ▴ 340

Hello Everyone,

I have a VCF with multiple individuals from multiple populations and I would like to get a summary of the allele frequency spectrum for each population. I know that VCFtools has some nice options for outputting allele frequencies. However my data is from non-model organisms and I think using the reference/derived alleles for calculating allele frequencies is resulting in some serious biases.

So I am looking for two possible solutions:

1) How to calculate the folded allele frequency spectrum (not biased by ancestral/reference allele assumptions) from a VCF file as starting point.

2) How to calculate the allele frequency spectrum using an outgroup species to infer the ancestral allele. Are there any packages out there for this? It does not seem trivial to me to infer which allele is presumed ancestral and to incorporate this into the VCF file to then calculate allele frequency spectrum based on ancestral/derived alleles.

Best

Rubal

genome allele frequency vcf pop gen • 7.3k views
4
Entering edit mode
6.9 years ago

This can easily be accomplished via VCFlib

If the end goal is association testing have a look at GPAT: https://github.com/jewmanchue/vcflib/wiki

EDIT:

GPAT++ now supports population summary statistics: https://github.com/jewmanchue/vcflib/wiki/Basic-population-statistics-with-GPAT

2
Entering edit mode
6.9 years ago

First command catches the allele frequency in the INFO field. If there is more than one alternative allele the site is skipped. The second command folds the allele frequency.

perl -lane '$_ =~ /AF=(.*?);/;if($1 !~ /,/){print $1}' your.vcf | perl -lane '$z = $_; if($z > 0.5){$z = 1 -$z} print \$z'


If you have an out group it will take a little more scripting and thought. For example the out group might not always contain the ancestral allele.

0
Entering edit mode

Thanks this looks great, is there a way to specify which individuals in the vcf file I want to include in the calculation?

Cheers