I'm looking for rare variants from whole-genome sequencing data. I found a "rare" SNP in my patient sample which has never been found in any database including latest 1000-Genome and exome sequencing database. However when I check this in other 4 randomly-chosen control whole-genome sequences from 1000G, it turned out within GC-rich region and barely covered by any reads (but in my data, sequencer goes through this GC-rich region resulting good coverage).
Then I would argue I'm not sure if the SNP I found is really rare, or just common one but missed by NGS in 1000G because PCR simply cannot go over the GC-rich region.
But 1000G got huge number of samples and call SNP/indel from this aggregation of samples simultaneously; it'll be almost impossible that one certain region won't be covered by any read, right?
So should I trust 1000Genome SNP/indel database for those GC-rich region?