Question: Summarising Whole Genome Allele Frequency Spectrum from VCF file
gravatar for Rubal
5.6 years ago by
Rubal270 wrote:

Hello Everyone,

I have a VCF with multiple individuals from multiple populations and I would like to get a summary of the allele frequency spectrum for each population. I know that VCFtools has some nice options for outputting allele frequencies. However my data is from non-model organisms and I think using the reference/derived alleles for calculating allele frequencies is resulting in some serious biases. 

So I am looking for two possible solutions:

1) How to calculate the folded allele frequency spectrum (not biased by ancestral/reference allele assumptions) from a VCF file as starting point. 

2) How to calculate the allele frequency spectrum using an outgroup species to infer the ancestral allele. Are there any packages out there for this? It does not seem trivial to me to infer which allele is presumed ancestral and to incorporate this into the VCF file to then calculate allele frequency spectrum based on ancestral/derived alleles.


Any comments or ideas are much appreciated. Thank you in advance!



ADD COMMENTlink modified 5.5 years ago by Zev.Kronenberg11k • written 5.6 years ago by Rubal270
gravatar for Zev.Kronenberg
5.5 years ago by
United States
Zev.Kronenberg11k wrote:

This can easily be accomplished via VCFlib :

If the end goal is association testing have a look at GPAT:


GPAT++ now supports population summary statistics:





ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Zev.Kronenberg11k
gravatar for Zev.Kronenberg
5.6 years ago by
United States
Zev.Kronenberg11k wrote:

First command catches the allele frequency in the INFO field.  If there is more than one alternative allele the site is skipped.  The second command folds the allele frequency. 

perl -lane '$_ =~ /AF=(.*?);/;if($1 !~ /,/){print $1}' your.vcf  | perl -lane '$z = $_; if($z > 0.5){$z = 1 - $z} print $z'

If you have an out group it will take a little more scripting and thought.  For example the out group might not always contain the ancestral allele. 

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Zev.Kronenberg11k

Thanks this looks great, is there a way to specify which individuals in the vcf file I want to include in the calculation?


ADD REPLYlink written 5.5 years ago by Rubal270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 919 users visited in the last hour