What Is The Significance Of The Population-Specific Dbsnp132 Vcf Files?
1
2
Entering edit mode
13.1 years ago
Epowell ▴ 20

I am trying to screen my NGS variants against known variations. The dbSNP132 VCF files seem like a great resource, but I'm not certain if I understand what is in them.

Specifically, what is the significance of the population specific files? Are these referring to HapMap populations? For example, does this file (ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/v4.0/ByChromosomeNoGeno/01-1409-CEU-nogeno.vcf.gz) contain all the chromosome 1 SNPS from the CEU HapMap population in addition to any 1000Genomes and dbSNP132 SNPS that were also found in this population?

dbsnp vcf hapmap genome population • 4.0k views
ADD COMMENT
2
Entering edit mode
13.1 years ago

dbSNP contains SNP data coming from different submitters, being HapMap and 1000 Genomes the most important ones when talking about population information. if you query for any SNP through dbSNP's web interface, say rs6059134 for instance, you will see that there is information from several populations included on the database.

since I presume that you are wondering what is exactly inside the files given by the dbSNP VCF ftp site I can only suggest you to check the readme file on that ftp site and have a look to the file naming convention example to find out that all the population numeric codes of that list correspond to HapMap data only.

14-12162-MKK.vcf.gz

Chr number => 14

dbSNP population ID => 12162 (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?pop=12162)

Three letter population identifier => MKK (http://ccr.coriell.org/sections/collections/NHGRI/?SsId=11)

ADD COMMENT
0
Entering edit mode

Jorge: You are absolutely correct, (and also much better at interpreting the README than I was).

Now, a second question I had was: what is in the so-called "full build" 00-All.vcf.gz? Presumbably, it contains the snps from all the listed populations, but what about snps that did not originate from a HapMap population? And what about 1000Genomes SNPS? Does this file contain EVERYTHING in dbsnp132?

p.s. I'm happy to post this as a brand new question if that would be better.

ADD REPLY
0
Entering edit mode

the README file states that 00-All.vcf.gz contains "a full build dump", although it doesn't describe it any further. it looks like it is a symlink to the 00-All.vcf.gz file contained in the ByChromosomeNoGeno folder, so I presume that it does contain HapMap SNPs only. if you are willing to deal with all dbSNP information I guess you will have to consider using the full chromosome reports instead. you will find them at ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/chr_rpts/

ADD REPLY
0
Entering edit mode

Actually, in the README inside the ByChromosomeNoGeno folder it says that:

This directory contains VCF files for each chromosome and HapMap populations available in dbSNP as well as a file containing all SNPs in dbSNP with the excetion of microsatellites, named variations, and other multi-byte variations where the adjacent nucelotides are unknown.

Maybe the 00-All.vcf.gz file is the one mentioned as "containing all SNPs in dbSNP". What do you think? Presumably, that would include 1000Genomes, too.

ADD REPLY
0
Entering edit mode

it would be fairly simple to check that: if the number of SNPs on that file is ~4M these are HapMap only, if it is ~28M it would contain all the known SNPs on dbSNP132. the README file suggests the later, so go for it and let us know ;)

ADD REPLY

Login before adding your answer.

Traffic: 2790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6