I am sequencing human exome data and looking for clinically relevant SNPs. I am using the standard GATK workflow, applying a hard filter, and then evaluating with snpeff and looking for ClinVar SNPs.
Overall, I'm getting about 1 in 25,000 exome bases being reported as a SNP at the end of GATK. Additionally, a single human exome results in about 450 ClinVar SNPs that are annotated with known disease states.
This seems quite high for me. Does anyone have a good idea about what frequency of SNPs I should be finding for a normal, healthy human exome? I assume I have lots of false positives due to my crude hard filtering method, but these are SNPs that survived the entire GATK workflow, including recalibration, etc., so I thought they would be higher quality.
Thanks for any perspective.