Question: Human Variation Databases
10
gravatar for Interact
8.0 years ago by
Interact100
Interact100 wrote:

Hello

I was hoping to find someone with some experience of human variation databases.

How long will it take for the SNPs discovered in the 1000 genome project to make their way into public databases like dbSNP?

Which human variation database has the best coverage of SNPs. Will all of the SNPs in dbSNP be covered in HGMD and vice versa?

Which database is the 'best' if you are interested in investigating SNPs in a clinical context?

Are there any pros/cons of the different databases (dnSNP/HGVbase/HGMD)

Thank you

genome sv database snp • 3.7k views
ADD COMMENTlink modified 5.4 years ago by Charles Warden5.6k • written 8.0 years ago by Interact100
8
gravatar for Pierre Lindenbaum
8.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:

dbSNP132 includes data from 1000 Genomes project pilot 1, 2, and 3 studies. ( http://www.ncbi.nlm.nih.gov/mailman/pipermail/dbsnp-announce/2010q4/000097.html )

for the difference between dbSNP and HGMD, see this previous question : http://biostar.stackexchange.com/questions/2817

ADD COMMENTlink written 8.0 years ago by Pierre Lindenbaum114k
8
gravatar for Khader Shameer
8.0 years ago by
Manhattan, NY
Khader Shameer17k wrote:

I would like to answer two specific aspects of your question:

Which database is the 'best' if you are interested in investigating SNPs in a clinical context?

If you are looking for SNPs with clinical relevance you could check dbGAP and PharmGKB. dbGAP provides results from genome-wide association studies where as PharmGKB provides candidate gene / genome wide studies relevant to pharmacogenomic variants.

dbGAP is a difficult resource to explore due restriction on access of phenotypes / traits and related data. You can get access to GWAS results (significant SNPs, P-value, OR) via HUGENavigator, except the clinical / raw data.

Which human variation database has the best coverage of SNPs. Will all of the SNPs in dbSNP be covered in HGMD and vice versa?

I am not sure about the coverage, but you can check Ensembl Variation, please take a look at the recent paper that explains the features of Ensembl Variation resources for more details on the features of this resource. If you are interested in annotation of the SNPs, You may also try the other variation based annotation databases like Varietas or SCANDB.

ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by Khader Shameer17k

Khader, I am putting some of the resources you put in this answer in the article at WikiGenes. I will cite this answer from Biostar.

ADD REPLYlink written 7.9 years ago by Giovanni M Dall'Olio26k

@Giovanni: Are you directly editing the article or adding it to the discussion page ?

ADD REPLYlink written 7.9 years ago by Khader Shameer17k
7
gravatar for Larry_Parnell
8.0 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Question: Which human variation database has the best coverage of SNPs.

Tough to answer because what do you mean? If you're looking across the human genome, then mine dbSNP. If you're interested in something more specific, say variants in CYP (P450) genes or mitochondrial genome differences, then you're best served by specialised databases.

The SNPs in clinical setting question is really hard, in my mind, because this is evolving quite rapidly. Do you want the 200,000 or so SNPs that 23andMe, for example, adds to their chip because there is evidence for an association of some type? For those SNPs, is premature gray hair or detecting asparagus byproducts in urine really relevant? Do you want those SNPs that are in OMIM because they have been found in medical cases? Do you want those that are routinely tested for in terms of metabolic health of newborns or pre-pregnancy counseling? Or do you want to think about the loads of new variants found from sequencing cancer genomes, especially the SNPs that can "tag" a copy number variant? This is not a pool of SNPs, but an amorphous cloud - boundaries are not well defined.

I know this is not an answer, per se. I don't have one because I don't need such SNPs for my research. These are just some thoughts I'd pose to my colleagues before we tried to capture such information. Good luck!

ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by Larry_Parnell16k
4
gravatar for Biomed
8.0 years ago by
Biomed4.4k
Bethesda, MD, USA
Biomed4.4k wrote:

The SNVs discovered in the 1000Genomes Project are routinely added to the dbSNP database. As Pierre mentioned dbSNP132 is the most recent version and has most SNPs annotated with 1000Genomes data as well as a lot of mostly lower frequency SNPs added to the database through 1000Genomes data. I highly recommend everyone to use this dataset.

dbSNP has the best covarage in general but if you are interested in a specific disease there are a lot of locus specific databases out there that have more variation data on a specific region/disease etc.

Clinical context is a hard one but I recommend dbSNP as a first pass filter and to dig down locus specific databases. HGMD is very good (it has disease annotations and literature links) but is not error free. So be careful with it as well.

I strongly discourage the use of dbSNP130 and below for any clinical correlation unless you are looking for very common snps or SNPs with poor validation and frequency metadata (one off submissions from a single sequence etc.)

ADD COMMENTlink written 8.0 years ago by Biomed4.4k

Any guess about the false-positive rate in HGMD?

ADD REPLYlink written 7.1 years ago by Tarbem10

I don't know any published numbers but there are certainly cases (although not too many) where the gene/variant is mentioned in a paper so it ends up in hgmd but when you read the paper you see that it is a negative finding. So in my opinion HGMD is a useful tool for a first pass survey but the outcome of that requires "human eye" before solid conclusions.

ADD REPLYlink written 7.1 years ago by Biomed4.4k
4
gravatar for Giovanni M Dall'Olio
8.0 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

Have a look at the following article:

1: Church DM, Lappalainen I, Sneddon TP, Hinton J, Maguire M, Lopez J, Garner J,

Paschall J, DiCuccio M, Yaschenko E, Scherer SW, Feuk L, Flicek P. Public data archives for genomic structural variation. Nat Genet. 2010 Oct;42(10):813-4. PubMed PMID: 20877315.

It describes DGVa and dbVAR from EVI/NCBI, which are resources with better annotated snps and variants.

ADD COMMENTlink written 8.0 years ago by Giovanni M Dall'Olio26k
1
gravatar for Larry_Parnell
8.0 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

EuroGentest may be a valuable and informative place to look. EuroGentest is a European initiative that is dealing with all aspects of genetic testing - Quality Management, Information Databases, Public Health, New Technologies and Education. Here is a direct link to the Information Databases page.

ADD COMMENTlink written 8.0 years ago by Larry_Parnell16k
0
gravatar for Malachi Griffith
6.8 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

The post 'How to search disease-causing chromosomal structure variation?' has some useful resources listed, including an updated review article along the lines suggested by 'Giovanni':

Sneddon TP, Church DM. Online resources for genomic structural variation. Methods Mol Biol. 2012;838:273-89. PubMed PMID: 22228017.

ADD COMMENTlink written 6.8 years ago by Malachi Griffith17k
0
gravatar for Charles Warden
5.4 years ago by
Charles Warden5.6k
Duarte, CA
Charles Warden5.6k wrote:

I don't know think you need to pick a single database.

For example, ANNOVAR is a popular tool for variant annotation. The basic report includes 1000 genome, ESP, and dbSNP annotations (as well as functional predictions for coding variants). It also allows you to search a number of additional databases (I usually use the basic report as well as annotations from the GWAS catalog).

http://www.openbioinformatics.org/annovar/

There is also a web-based version of ANNOVAR (wANNOVAR), but I think the local installation may provide a greater range of functionality.

SeattleSNP is another popular tool (although I personally like ANNOVAR a little better):

http://snp.gs.washington.edu/SeattleSeqAnnotation137/

If you are studying your own personal genome, you can also check out Promethease (although I wouldn't consider that a standard practice for scientific publications). It uses SNPedia for annotations.

http://snpedia.com/index.php/Promethease

ADD COMMENTlink written 5.4 years ago by Charles Warden5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour