Question: SNP count per gene, without disease mutations and observation bias
gravatar for Suicyte
3.5 years ago by
Suicyte10 wrote:

Is there a painless way of compiling/estimating the number of SNPs per human gene, while excluding disease-associates SNPs ? Ideally, one would also compensate for observation bias (e.g. particular genes have been sequenced a gazillion times while others are only covered in genome-wide searches) ?

The idea is not to know every single SNP ever observed, but rather to get an estimate of the 'degree of polymorphism' of particular genes. I have tried to download dbSNP, but there you have all kinds of disease-causing mutations, and there is a strong over-emphasis on important genes such as TP53 or ATM. What would work are e.g. data coming from genome-wide SNP calling of diverse but healthy population. Or a way to filter out disease-related SNPs and somehow normalized for different gene coverage.

ExAc ( looks interesting, but if they have such data, I can't find them.

Any help would be appreciated!!

snp • 739 views
ADD COMMENTlink written 3.5 years ago by Suicyte10

This could be a difficult question as the number of known disease causing SNPs are limited by as disease variant list is not comprehensive. Following steps may be useful:

  1. Collect all variants in an organism of interest (for eg. dbSNP variants in human)
  2. Collect all the disease variants (from all possible sources such as clinvar, COSMIC, HGMD, PharmGKB etc)
  3. Do an inverse selection (i.e variants unique to dbSNP).
ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by cpad011214k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2739 users visited in the last hour