I was looking for some basic statistics related to SNPs in human genome. I looked at several database / FAQ pages but couldn't find the details. Do you know about any resource / page that track these information ?
- Total number of SNPs in reference genomes (hg19, HuRef etc... )
- SNPs in 'personal genomes' (1000genomes or any recent personal genomes) that are not present in reference genomes
- Private SNPs in personal genomes
- Variants of Unknown Significance per personal genomes
Thanks !
Can you point me to the specific page at dbSNP/COSMIC that provides the statistics for reference genomes (hg19/HuRef/Korean genome/1000genome etc) ? Thanks !
Ah, here are some: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi Remember that the dbSNP build and hgxx are not necessarily correlated. You can see the pertinent details in the READMEs. To convert from one to the other you may need to liftover. http://genome.ucsc.edu/cgi-bin/hgLiftOver
COSMIC is more disease oriented, and it doesn't look like they have any stats per se. As mentioned below, you could use some scripting to achieve this. My vote is BioPython, but I have a heavy python bias.
Thanks. I know about the summary page, but there is no explicit information about private / VUS in that page.