Question: Snps Present In Dbsnp But Absent In 1000Genome And Esp Database
2
gravatar for michealsmith
6.9 years ago by
michealsmith740
michealsmith740 wrote:

Which database would you like to choose to filter those common SNPs and find out rare ones which may be disease-causing? I used to apply 1000Genome as well as ESP (exome sequencing project) database. (ESP is derived from exome data of about 6500 individuals, which is fairly large enough.) Also both databases contains MAF. I don't initially use dbSNP, because it simply contains everything thus less permissive.

But I find sth. interesting today that, there are some SNPs, for example rs73979896: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=308088757&c=chr17&o=21319207&t=21319208&g=snp135Common&i=rs73979896

THis SNP, nonsynonymous, present in dbSNP-135, with a very high MAF=49% derived from around 2204 alleles; however, it's absent from either 1000Genome (2012-Apr) or ESP-6500 (The latest version with exome data from 6500 individuals)! If this is really a true SNP with MAF=49%, how can it NOT be captured in ESP with information of 6500 ppl? This is very confusing.

dbsnp • 3.8k views
ADD COMMENTlink modified 6.9 years ago by Laura1.7k • written 6.9 years ago by michealsmith740
4
gravatar for Laura
6.9 years ago by
Laura1.7k
Cambridge UK
Laura1.7k wrote:

This snp was part of 1000 genomes original call set

tabix -h http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/input_call_sets/ALL.wgs.union_vqsr2b.20101123.snps.low_coverage.sites.vcf.gz 17:21319208-21319208

If you look in the reference genome this is a patched part of the reference plus this site does not fall within our strict accessibility mask

http://browser.1000genomes.org/Homo_sapiens/Location/View?r=17%3A21319208-21319208#r=17:21316709-21321708

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/accessible_genome_masks/README_20120824_accessibility_mask_bed_files

It might be a false negative on our part, it might also be a false positive on the other groups who have called it

ADD COMMENTlink written 6.9 years ago by Laura1.7k

thanks. Curious why this SNP is filtered later? Because of low coverage? Also I checked several bam files of unrelated individuals NOT from 1000Genome, this SNP does exist. However if checking bam files from 1000G, for example NA12878, NA12889, this SNP is not there. Problem is, this SNP has MAF=50%; it's common allele, not rare. Different groups should be consistent for common SNPs, right? That's where I'm confused.

ADD REPLYlink written 6.9 years ago by michealsmith740

Also ,what's special about genome patch in terms of calling variants? Is genome patch supposed to be regions holding many mutations?

ADD REPLYlink written 6.9 years ago by michealsmith740

Patched regions of the assembly are more likely to be regions with highly repetitive sequences or that are otherwise hard to assemble and thus to map to. That could be part of the issue here as well.

I do use dbSNP as well as 1000G and ESP MAF's but I tend to stick to older dbSNP versions. For newer versions I would want to go by the estimated MAF and not simple presence/absence, as I've seen entries in dbSNP with no population data at all and only seen in say one individual.

ADD REPLYlink written 6.9 years ago by Dan Gaston7.1k
1
gravatar for JC
6.9 years ago by
JC8.4k
Mexico
JC8.4k wrote:

I don't know why 1000G or ESP are missing this SNP, this could be a low coverage area or many other factors, but Kaviar has it: http://db.systemsbiology.net/kaviar/cgi-pub/Kaviar2.pl?chr=chr17&pos=21319207

ADD COMMENTlink written 6.9 years ago by JC8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour