I am trying to call SNPS across 150 individuals of a non-model species genotyped using a WGS resequencing approach.
In short: I aligned reads from each sample against the reference using BWA and subsequently used bcftools mpileup to calculate the counts and bcftools call to call genotypes. I performed this on each sample separatedly and allowed to call the consensus (i.e. equal to reference) genotypes. I then used bcftools merge to create a unique vcf file containing all the samples and filtered for missing rates.
I now want to perform a quality filter to remove genotypes with low read counts. The problem is that I noticed that heterozygous genotypes have usually more read counts than homozygotes. For this reason, filtering for read count (DP) produces a dataset where it is rare to observe a SNP with three genotypes, which doesn't make much sense...
Is it normal that heterozygotes genotypes have more DP, comapred to homozygotes? If not, what could be the cause? If yes, how can I deal with this during filtering of the vcf?
thank you in advance