Question: Help obtaining splice site data from dbSNP!
1
gravatar for Parth Patel
4.6 years ago by
Parth Patel40
United States
Parth Patel40 wrote:

Hi all,

    I have been trying to download the splice site mutations found in dbSNP along with their corresponding genomic coordinates and nucleotide changes. I have spent way too much time trying to create a local mySQL instance of dbSNP from this site: https://cgsmd.isi.edu/dbsnpq/downloads.php#dbSNPdownloadsTables . The sql dumps and .txt dumps are not in UTF-8 encoding and I cannot figure out an encoding to get them to render normal text. 

Any suggestions on ways to get this data from dbSNP? Any help would be much appreciated!

Thank you!

Parth

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Parth Patel40
3
gravatar for Michael Dondrup
4.6 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Don't think you need to make it that complicated, could you try an Ensembl BioMart query instead?

Like so>

Not sure if you will be able to download everything that way though. Edit: works fine, please follow Emily's suggestion to get the results file.

Edit: you might need to modify the query to suit your needs

ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by Michael Dondrup46k
3

That query is going to get to 30940 variants. If you do it that way, I suggest you download your data using Export results to compressed web file (notify by email). If you try to download that directly, it'll just stop partway through due to a lost connection and you won't get all your data.

ADD REPLYlink written 4.6 years ago by Emily_Ensembl19k
1

Yes can confirm this, using "compressed web file" (.gz) and notify by email, took only few minutes until the download was ready.

ADD REPLYlink written 4.6 years ago by Michael Dondrup46k
0
gravatar for Parth Patel
4.6 years ago by
Parth Patel40
United States
Parth Patel40 wrote:

Thanks Michael and Emily!  This has made this process so much easier. I still need the genomic reference nucleotide for each variant. I understand that to gain a true reference, I would need to specify populations and look into allele frequencies. Would using the "ancestral allele" serve as a good surrogate? 

To provide some context -- I am trying to obtain a set of benign splice site mutations. To do this I set a filter for minor allele frequencies that are > 0.05. I need reference and alternate nucleotide positions so that I can gauge the effectiveness of programs that predict pathogenicities of splice site mutations. I believe the ancestral allele should be decent surrogate because I assume the most common allele in the chimp genome is reported, but still have some reservations. 

Thanks!

Parth

ADD COMMENTlink written 4.6 years ago by Parth Patel40

You can get the alleles and the minor allele as attributes in BioMart. If you know ref/alt and the minor allele, then the other one is the major allele. Ancestral is also useful.

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by Emily_Ensembl19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1833 users visited in the last hour