https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5449402/ - In this study, it is mentioned that human APOE gene has 183 validated SNPs out of which 31 are missense, 21 are synonymous, 2 are nonsense, 98 are intronic, 7 are 5′ UTR, 6 are 3′ UTR, 7 are downstream, 8 are upstream, 1 is splice donor and 2 are splice acceptor variants. This data is collected using dbSNP. I would like to know how to collect these validated SNP dataset from dbSNP.
Question: How to collect SNP dataset from databases for SNP analysis?
0
arr234 • 20 wrote:
ADD COMMENT
• link
•
modified 2.2 years ago
by
Pierre Lindenbaum ♦ 134k
•
written
2.2 years ago by
arr234 • 20
1
Kevin Blighe ♦ 71k wrote:
The link provided by maryamtavasoli71 relates to eQTL studies, which is not what you want.
If you are not comfortable using the command line and in working with the dbSNP data locally, then you can just use the Ensembl Genome Browser to look up all variants in a particular gene. HERE is a search configured for APOE:
-----------------------------------------
Click on the Excel® sheet icon (at right) in order to download the data as CSV:
The data contains scores from in silico predictors, like SIFT, PolyPhen, MutationAssessor, CADD, etc. I did my own quick filtering and more or less identified ~200 'damaging' variants in the gene.
Kevin
1
Pierre Lindenbaum ♦ 134k wrote:
using mysq ucsc
$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select func,valid,count(*) from snp142 where chrom="chr19" and chromStart>=44905749 and chromEnd<=44909395 group by func,valid'
+----------------------------+--------------------------------------------------------+----------+
| func | valid | count(*) |
+----------------------------+--------------------------------------------------------+----------+
| coding-synon | unknown | 13 |
| coding-synon | by-frequency | 1 |
| coding-synon | by-1000genomes | 3 |
| coding-synon | by-cluster,by-1000genomes | 2 |
| coding-synon | by-frequency,by-1000genomes | 2 |
| intron | unknown | 25 |
| intron | by-cluster | 4 |
| intron | by-1000genomes | 33 |
| intron | by-cluster,by-1000genomes | 3 |
| intron | by-frequency,by-1000genomes | 24 |
| intron | by-cluster,by-frequency,by-1000genomes | 11 |
| near-gene-5 | by-cluster,by-frequency,by-1000genomes | 1 |
| nonsense | unknown | 2 |
| missense | unknown | 28 |
| missense | by-cluster | 8 |
| missense | by-1000genomes | 11 |
| missense | by-frequency,by-1000genomes | 7 |
| missense | by-cluster,by-frequency,by-1000genomes | 1 |
| missense | by-cluster,by-frequency,by-2hit-2allele,by-1000genomes | 2 |
| missense | by-frequency,by-hapmap,by-1000genomes | 1 |
| intron,missense | by-1000genomes | 1 |
| intron,missense | by-frequency,by-1000genomes | 1 |
| intron,missense | by-cluster,by-frequency,by-hapmap,by-1000genomes | 1 |
| frameshift | unknown | 1 |
| cds-indel | unknown | 2 |
| untranslated-3 | unknown | 2 |
| untranslated-3 | by-1000genomes | 1 |
| untranslated-3 | by-frequency,by-1000genomes | 3 |
| untranslated-5 | by-1000genomes | 3 |
| intron,untranslated-5 | by-1000genomes | 1 |
| intron,untranslated-5 | by-frequency,by-1000genomes | 1 |
| near-gene-5,untranslated-5 | by-1000genomes | 1 |
| splice-3 | unknown | 1 |
| splice-3 | by-cluster | 1 |
| splice-5 | by-cluster | 1 |
+----------------------------+--------------------------------------------------------+----------+
Please log in to add an answer.
Use of this site constitutes acceptance of our User
Agreement
and Privacy
Policy.
Powered by Biostar
version 2.3.0
Traffic: 1424 users visited in the last hour
ExSNP database (http://www.exsnp.org/DZeQTL)