Count total number of Homozygous/Heterozygous SNPs from VCF file
1
0
Entering edit mode
2.1 years ago
nataliagru1 ▴ 90

Dear Community,

I would like to count the total number of heterozygous and homozygous SNP's in my VCF file. I have read up on other forums but I can't seem to find an answer/guidance on how to perform this.

I am simply wondering if there is a simple way to calculate total heterozygous SNPs and total homozygous SNPs for a given VCF file. For example, I am working with 7 parasite genomes that I have mapped and called variants using GATK. For parasite_1.vcf I would like to know what percentage of SNPs called are homozygous or heterozygous. I would like to summarize this information in a table like below.

Strain SNPs Homozygous Heterozygous 
strain1 11,091 7,857 3,234
strain2 10,772 6,355 4,367
etc.

With SNPs being total SNP count, Homozygous being total homozygous SNP count and Heterozygous being total heterozygous SNP count, for a given strain VCF file.

Any guidance or advice is greatly appreciated.

SNP VCF allele • 1.8k views
ADD COMMENT
1
Entering edit mode
2.1 years ago
colindaven ★ 3.9k

Theres' a tool called vt (available via bioconda) which can do very nice summaries of vcf files.

ADD COMMENT
1
Entering edit mode

Thank you for your comment. I also found this tool to summarize SNP data very nicely.[1] https://www.realtimegenomics.com/products/rtg-tools

ADD REPLY
0

Login before adding your answer.

Traffic: 1862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6