Question: Count total number of Homozygous/Heterozygous SNPs from VCF file
gravatar for nataliagru1
11 weeks ago by
nataliagru150 wrote:

Dear Community,

I would like to count the total number of heterozygous and homozygous SNP's in my VCF file. I have read up on other forums but I can't seem to find an answer/guidance on how to perform this.

I am simply wondering if there is a simple way to calculate total heterozygous SNPs and total homozygous SNPs for a given VCF file. For example, I am working with 7 parasite genomes that I have mapped and called variants using GATK. For parasite_1.vcf I would like to know what percentage of SNPs called are homozygous or heterozygous. I would like to summarize this information in a table like below.

Strain SNPs Homozygous Heterozygous 
strain1 11,091 7,857 3,234
strain2 10,772 6,355 4,367

With SNPs being total SNP count, Homozygous being total homozygous SNP count and Heterozygous being total heterozygous SNP count, for a given strain VCF file.

Any guidance or advice is greatly appreciated.

snp allele vcf • 157 views
ADD COMMENTlink modified 11 weeks ago by colindaven2.3k • written 11 weeks ago by nataliagru150
gravatar for colindaven
11 weeks ago by
Hannover Medical School
colindaven2.3k wrote:

Theres' a tool called vt (available via bioconda) which can do very nice summaries of vcf files.

ADD COMMENTlink written 11 weeks ago by colindaven2.3k

Thank you for your comment. I also found this tool to summarize SNP data very nicely.[1]

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by nataliagru150

ADD REPLYlink written 11 weeks ago by nataliagru150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 964 users visited in the last hour