I am trying to get a handle on the quality of submissions to dbSNP. The list of validation statuses are given as:
- multiple independent submissions;
- frequency or genotype data;
- submitter confirmation;
- observation of all alleles in at least two chromosomes;
- genotyped by HapMap;
- sequenced in the 1000 Genomes Project
points 1,5 and 6 seem fairly reliable but I am interested in point 2/4.
How accurate does the genotype data need to be in point 2? For example we have carried out sequencing work and found potential snps only to find they were erroneous on resequencing. This data was for pooled DNA but we did have high quality counts for both alleles with good coverage.
edit: it has been pointed out by DQ (thank-you) that most genotypes are confirmed by sanger sequencing. Can i assume that the genotype and allele frequencies in dbSNP are based on confirmed genotypes via a method such as sanger sequencing?
Thank you for your time