Count total number of Homozygous/Heterozygous SNPs from VCF file
1
0
Entering edit mode
2.1 years ago
nataliagru1 ▴ 90

Dear Community,

I would like to count the total number of heterozygous and homozygous SNP's in my VCF file. I have read up on other forums but I can't seem to find an answer/guidance on how to perform this.

I am simply wondering if there is a simple way to calculate total heterozygous SNPs and total homozygous SNPs for a given VCF file. For example, I am working with 7 parasite genomes that I have mapped and called variants using GATK. For parasite_1.vcf I would like to know what percentage of SNPs called are homozygous or heterozygous. I would like to summarize this information in a table like below.

Strain SNPs Homozygous Heterozygous
strain1 11,091 7,857 3,234
strain2 10,772 6,355 4,367
etc.


With SNPs being total SNP count, Homozygous being total homozygous SNP count and Heterozygous being total heterozygous SNP count, for a given strain VCF file.

Any guidance or advice is greatly appreciated.

SNP VCF allele • 1.8k views
0
Entering edit mode
1
Entering edit mode
2.1 years ago
colindaven ★ 3.9k

Theres' a tool called vt (available via bioconda) which can do very nice summaries of vcf files.

1
Entering edit mode

Thank you for your comment. I also found this tool to summarize SNP data very nicely.[1] https://www.realtimegenomics.com/products/rtg-tools

0
Entering edit mode