I have been using GATK for SNP calling and wanted to see the "private" or unique SNPs in my case and control groups against 1000G phase3 and dbsnp 142. I used CombineVariants to annotate the unique variants and then SelectVariants to filter them out. Strange thing was my controls are from 1000G Great Britain cohort and I still get over 400,000 unique SNPs when filtered against 1000G and dbsnp.
Do you guys think this was because of joint genotyping with my cases and controls that helped GATK discover new SNPs in the GBR set?
Also troublesome, is that ANNOVAR is annotating a handful of my control/GBR population with SNPs in 1000g2014 and when I check it on the 1000G browser, it says there is no data for that region.
Any explanation or suggestion would help.
Thanks in advance!