Resource For Tracking Snp Statistics
2
2
Entering edit mode
12.0 years ago

I was looking for some basic statistics related to SNPs in human genome. I looked at several database / FAQ pages but couldn't find the details. Do you know about any resource / page that track these information ?

  • Total number of SNPs in reference genomes (hg19, HuRef etc... )
  • SNPs in 'personal genomes' (1000genomes or any recent personal genomes) that are not present in reference genomes
  • Private SNPs in personal genomes
  • Variants of Unknown Significance per personal genomes

Thanks !

snp genome • 2.7k views
ADD COMMENT
2
Entering edit mode
12.0 years ago
Gmoney ▴ 220

The resources I use are dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/ and COSMIC http://www.sanger.ac.uk/genetics/CGP/cosmic/ which are probably the two biggest off the top of my head. Both of the databases are available as flat files, which you can use with your own software.

ADD COMMENT
0
Entering edit mode

Can you point me to the specific page at dbSNP/COSMIC that provides the statistics for reference genomes (hg19/HuRef/Korean genome/1000genome etc) ? Thanks !

ADD REPLY
1
Entering edit mode

Ah, here are some: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi Remember that the dbSNP build and hgxx are not necessarily correlated. You can see the pertinent details in the READMEs. To convert from one to the other you may need to liftover. http://genome.ucsc.edu/cgi-bin/hgLiftOver

COSMIC is more disease oriented, and it doesn't look like they have any stats per se. As mentioned below, you could use some scripting to achieve this. My vote is BioPython, but I have a heavy python bias.

ADD REPLY
0
Entering edit mode

Thanks. I know about the summary page, but there is no explicit information about private / VUS in that page.

ADD REPLY
0
Entering edit mode
12.0 years ago

I have some perl scripts that I could use on 1kg genomes to generate these types of data if you want. It generates a distribution of shared vs private variants.

ADD COMMENT
0
Entering edit mode

Thanks, Zev. If you have done some analysis you could share your results or you may also upload/share your script. At the moment am looking for a citable resource that tracks such information.

ADD REPLY
0
Entering edit mode

Hi Zev, I am not able to find a review/manuscript that provide info or shared/private variants. Can you explain the pipeline you have used to find the private variants ? What all features you are considering to filter private SNPs ?

ADD REPLY
0
Entering edit mode

Essentially my script takes the 1000 genomes data (or anything else in CDR fromat) and does set operations on the individuals. It creates a giant bit vector (0 or 1; 1= non ref allele). Each non ref allele has its own position in the bit vector (a single loci can have multiple bits). Once the data is constructed I can do any set operation.

ADD REPLY

Login before adding your answer.

Traffic: 2978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6