Question: Count total number of Homozygous/Heterozygous SNPs from VCF file
gravatar for nataliagru1
9 months ago by
nataliagru160 wrote:

Dear Community,

I would like to count the total number of heterozygous and homozygous SNP's in my VCF file. I have read up on other forums but I can't seem to find an answer/guidance on how to perform this.

I am simply wondering if there is a simple way to calculate total heterozygous SNPs and total homozygous SNPs for a given VCF file. For example, I am working with 7 parasite genomes that I have mapped and called variants using GATK. For parasite_1.vcf I would like to know what percentage of SNPs called are homozygous or heterozygous. I would like to summarize this information in a table like below.

Strain SNPs Homozygous Heterozygous 
strain1 11,091 7,857 3,234
strain2 10,772 6,355 4,367

With SNPs being total SNP count, Homozygous being total homozygous SNP count and Heterozygous being total heterozygous SNP count, for a given strain VCF file.

Any guidance or advice is greatly appreciated.

snp allele vcf • 558 views
ADD COMMENTlink modified 9 months ago by colindaven2.6k • written 9 months ago by nataliagru160
gravatar for colindaven
9 months ago by
Hannover Medical School
colindaven2.6k wrote:

Theres' a tool called vt (available via bioconda) which can do very nice summaries of vcf files.

ADD COMMENTlink written 9 months ago by colindaven2.6k

Thank you for your comment. I also found this tool to summarize SNP data very nicely.[1]

ADD REPLYlink modified 9 months ago • written 9 months ago by nataliagru160

ADD REPLYlink written 9 months ago by nataliagru160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1419 users visited in the last hour