Question: Summarise genotypes in a VCF file
2
gravatar for Dave Tang
4.4 years ago by
Dave Tang190
Australia
Dave Tang190 wrote:

Dear all,

I have created a multi-sample VCF file from HapMap data that only contains genotypes (GT). I am looking for a program/tool that can calculate the allele frequencies (AF) and allele counts (AN), and have them subsequently added to the INFO field. It's not that difficult to script up, but I was wondering if a tool already exists to do this.

Thanks!

Dave

genotypes vcf • 1.8k views
ADD COMMENTlink modified 3.0 years ago • written 4.4 years ago by Dave Tang190
1
gravatar for Dave Tang
3.0 years ago by
Dave Tang190
Australia
Dave Tang190 wrote:

Use vcflib available at https://github.com/vcflib/vcflib and the vcffixup tool.

cat output.vcf | grep -v "^#" | head -1
chr21   9889293 rs28676788 G    A       .       .       .       GT      0/0     ./.     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     1/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0    0/0      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     1/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     ./.     0/0     0/0     0/0     0/0     1/0     1/0     0/0     0/0    ./.      0/0     0/0     ./.     1/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     0/0     0/0     1/0     1/0     1/0     0/0     0/0     1/0     1/0

vcffixup output.vcf | grep -v "^#" | head -1
chr21   9889293 rs28676788 G    A       0       .       AC=16;AF=0.101266;AN=158;NS=83  GT      0/0     ./.     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     1/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0    0/0      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     1/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     0/0     0/0     0/0     0/0     0/0     ./.     0/0     0/0     0/0     0/0     1/0    1/0      0/0     0/0     ./.     0/0     0/0     ./.     1/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0     1/0     0/0     0/0     0/0     1/0     1/0     1/0     0/0     0/0     1/0     1/0

NS refers to the number of calls, i.e. the number of samples. 4 samples had no genotype, i.e. ./., therefore AN is 79*2 = 158. AC is the alternate allele count and AF is the alternate allele frequency.

ADD COMMENTlink written 3.0 years ago by Dave Tang190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1758 users visited in the last hour