Question: Summarising Whole Genome Allele Frequency Spectrum from VCF file
3
gravatar for Rubal
6.4 years ago by
Rubal330
Germany
Rubal330 wrote:

Hello Everyone,

I have a VCF with multiple individuals from multiple populations and I would like to get a summary of the allele frequency spectrum for each population. I know that VCFtools has some nice options for outputting allele frequencies. However my data is from non-model organisms and I think using the reference/derived alleles for calculating allele frequencies is resulting in some serious biases. 

So I am looking for two possible solutions:

1) How to calculate the folded allele frequency spectrum (not biased by ancestral/reference allele assumptions) from a VCF file as starting point. 

2) How to calculate the allele frequency spectrum using an outgroup species to infer the ancestral allele. Are there any packages out there for this? It does not seem trivial to me to infer which allele is presumed ancestral and to incorporate this into the VCF file to then calculate allele frequency spectrum based on ancestral/derived alleles.

 

Any comments or ideas are much appreciated. Thank you in advance!

Best

Rubal

ADD COMMENTlink modified 6.4 years ago by Zev.Kronenberg11k • written 6.4 years ago by Rubal330
4
gravatar for Zev.Kronenberg
6.4 years ago by
United States
Zev.Kronenberg11k wrote:

This can easily be accomplished via VCFlib

If the end goal is association testing have a look at GPAT: https://github.com/jewmanchue/vcflib/wiki

EDIT:

GPAT++ now supports population summary statistics: https://github.com/jewmanchue/vcflib/wiki/Basic-population-statistics-with-GPAT

ADD COMMENTlink modified 9 months ago by RamRS30k • written 6.4 years ago by Zev.Kronenberg11k
2
gravatar for Zev.Kronenberg
6.4 years ago by
United States
Zev.Kronenberg11k wrote:

First command catches the allele frequency in the INFO field. If there is more than one alternative allele the site is skipped. The second command folds the allele frequency.

perl -lane '$_ =~ /AF=(.*?);/;if($1 !~ /,/){print $1}' your.vcf | perl -lane '$z = $_; if($z > 0.5){$z = 1 - $z} print $z'

If you have an out group it will take a little more scripting and thought. For example the out group might not always contain the ancestral allele.

ADD COMMENTlink modified 9 months ago by RamRS30k • written 6.4 years ago by Zev.Kronenberg11k

Thanks this looks great, is there a way to specify which individuals in the vcf file I want to include in the calculation?

Cheers

ADD REPLYlink written 6.4 years ago by Rubal330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1954 users visited in the last hour