Calculate allele frequency from many VCF files in specific locus
1
1
Entering edit mode
4.1 years ago
John ▴ 160

Dear all,

I have 100 VCF files (100 different samples). I would like to calculate allele frequency in specific sites.

In one specific locus I have three genotypes (GATK best practices workflow):

rs-xxxxx:
A/A occurring in 30 samples (ref hom)
A/G occurring in 21 samples (het)
G/G occurring in 49 samples (alt hom)

Frequency of genotype would be:

A/A = 0.3
A/G = 0.21
G/G = 0.49

But how do I calculate allele frequency of A/G ?

dbSNP define this like: (sum of chromosome counts over all member) / (total chromosome counts over all member)

Thank you for any educative example.

Paul.

genotyp vcf next-gen freq • 4.0k views
ADD COMMENT
0
Entering edit mode

hello, I am also lost in finding the solution for this, can u please suggest me how to go about it?

ADD REPLY
1
Entering edit mode
4.1 years ago

use bcftools merge to combine your vcf at this position and extract the INFO/AF field with bcftools query

ADD COMMENT
1
Entering edit mode

Yes bcftools merge is good idea. You can also use GATK Combinegvcf. bcftools query - (bcftools query -f '%CHROM %POS[\t%DP\t%AD]\n' ) gaves you DP and AD, and you would be able calculate frequencies. But frequency from all samples - just to get one number like in dbSNP. What about one genotype divide sum of other genotypes?

ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6