Question: SNP count per gene, without disease mutations and observation bias
3.5 years ago
wrote:

Is there a painless way of compiling/estimating the number of SNPs per human gene, while excluding disease-associates SNPs ? Ideally, one would also compensate for observation bias (e.g. particular genes have been sequenced a gazillion times while others are only covered in genome-wide searches) ?

The idea is not to know every single SNP ever observed, but rather to get an estimate of the 'degree of polymorphism' of particular genes. I have tried to download dbSNP, but there you have all kinds of disease-causing mutations, and there is a strong over-emphasis on important genes such as TP53 or ATM. What would work are e.g. data coming from genome-wide SNP calling of diverse but healthy population. Or a way to filter out disease-related SNPs and somehow normalized for different gene coverage.

ExAc ( looks interesting, but if they have such data, I can't find them.

Any help would be appreciated!!

This could be a difficult question as the number of known disease causing SNPs are limited by as disease variant list is not comprehensive. Following steps may be useful:

  1. Collect all variants in an organism of interest (for eg. dbSNP variants in human)
  2. Collect all the disease variants (from all possible sources such as clinvar, COSMIC, HGMD, PharmGKB etc)
  3. Do an inverse selection (i.e variants unique to dbSNP).
