Summarising Whole Genome Allele Frequency Spectrum from VCF file
2
4
Entering edit mode
9.9 years ago
Rubal ▴ 350

Hello Everyone,

I have a VCF with multiple individuals from multiple populations and I would like to get a summary of the allele frequency spectrum for each population. I know that VCFtools has some nice options for outputting allele frequencies. However my data is from non-model organisms and I think using the reference/derived alleles for calculating allele frequencies is resulting in some serious biases.

So I am looking for two possible solutions:

  1. How to calculate the folded allele frequency spectrum (not biased by ancestral/reference allele assumptions) from a VCF file as starting point.
  2. How to calculate the allele frequency spectrum using an outgroup species to infer the ancestral allele. Are there any packages out there for this? It does not seem trivial to me to infer which allele is presumed ancestral and to incorporate this into the VCF file to then calculate allele frequency spectrum based on ancestral/derived alleles.

Any comments or ideas are much appreciated. Thank you in advance!

Best
Rubal

allele-frequency vcf pop-gen genome • 9.1k views
ADD COMMENT
4
Entering edit mode
9.9 years ago

This can easily be accomplished via VCFlib

If the end goal is association testing have a look at GPAT: https://github.com/jewmanchue/vcflib/wiki

EDIT:

GPAT++ now supports population summary statistics: https://github.com/jewmanchue/vcflib/wiki/Basic-population-statistics-with-GPAT

ADD COMMENT
2
Entering edit mode
9.9 years ago

First command catches the allele frequency in the INFO field. If there is more than one alternative allele the site is skipped. The second command folds the allele frequency.

perl -lane '$_ =~ /AF=(.*?);/;if($1 !~ /,/){print $1}' your.vcf | perl -lane '$z = $_; if($z > 0.5){$z = 1 - $z} print $z'

If you have an out group it will take a little more scripting and thought. For example the out group might not always contain the ancestral allele.

ADD COMMENT
0
Entering edit mode

Thanks this looks great, is there a way to specify which individuals in the vcf file I want to include in the calculation?

Cheers

ADD REPLY

Login before adding your answer.

Traffic: 2996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6