Question: Dbsnp: Mappings To Protein Sequence?
4
gravatar for Chris
7.1 years ago by
Chris1.6k
Munich
Chris1.6k wrote:

Hey,

we are trying to get a local sub-part of dbSNP running on our servers here in our group. Since we are only interested in nsSNPs, we are specifically interested in mappings of rs# to protein sequence, i.e. the concrete RefSeq identifier, the sequence position and the mutant residue. Following the dbSNP handbook from NCBI it seems that the organism-specific SNPContigLocusId tables are of major interest and indeed they have everything that we need. However, those tables only exist for 14 organisms out of overall 100. Does that mean that for the huge majority there don't exist these mappings to protein sequences? If so, why? Or could this information be stored somewhere else in the huge space of dbSNP tables?

Thanks for sharing any insights, Chris

dbsnp protein mapping snp • 1.8k views
ADD COMMENTlink modified 20 months ago by khulood4455920 • written 7.1 years ago by Chris1.6k
1

Are you interested in SNPs from all organisms or limited to a subset ? Such mappings are available in various nsSNP annotation database for human, not sure about other organisms.

ADD REPLYlink written 7.1 years ago by Khader Shameer17k

I'm interested in nsSNPs from all organisms that show up in dbSNP. Human is among the 14 organisms that have the mappings. Thanks, Chris

ADD REPLYlink written 7.0 years ago by Chris1.6k

Hi Chris,

How is your mapping from nsSNP to protein sequence? I am working on a similar project right now. Do you find why only limited mapping from nsSNP to protein sequence?

ADD REPLYlink written 3.2 years ago by ajingnk120
1
gravatar for Jan Kosinski
7.0 years ago by
Jan Kosinski1.6k
Jan Kosinski1.6k wrote:

In my group, a server has just been developed that does more or less the thing you want (if I understood correctly your question ;-).

http://www.biocomputing.it/picmi/

You can try with Nucleotide input option, see Help for input description.

However, in output you would get the the sequence position and the mutant residue but not on RefSeq but Ensemble transcript. Ensemble transcript do have links to RefSeq, but I don't know how to retrieve them automatically for highthrouput input.

Give it a try, and contact authors if you need more.

ADD COMMENTlink written 7.0 years ago by Jan Kosinski1.6k

Thanks Jan, I'll give it a try. However I'd really like to know, why dbSNP only has these mappings to 14 organisms. There must be a reason for that. Chris

ADD REPLYlink written 7.0 years ago by Chris1.6k
0
gravatar for User 6318
5.8 years ago by
User 63180
User 63180 wrote:

Hi, Chris! In my group, we are currently trying to build a human protein variant database generated from nsSNPs. We need to store both the amino acid sequence of protein variant and original protein. But I can only find protein_acc, residue for the SNP allele and position, but not the protein sequence in SNPContigLocusId tables. Where can I find and download all human protein variant sequence mapped from nsSNPs?

ADD COMMENTlink written 5.8 years ago by User 63180

Hi, the fields protein_acc and protein_ver are pointers to RefSeq. To get the corresponding sequences go to their ftp server and download [1] the fasta file that contains all human sequences. This normally does not contain all sequences that are being referenced in dbSNP. In those cases you have to download those at NCBI case by case, e.g. by using Entrez.

[1] ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/H_sapiens/protein/protein.fa.gz

ADD REPLYlink written 5.8 years ago by Chris1.6k
0
gravatar for khulood445592
20 months ago by
khulood4455920 wrote:

hi I have question in bioinformatics I have gen which is IL8 and this has mutation TGC>TGG how I could find it if the mutation in codon 36

ADD COMMENTlink written 20 months ago by khulood4455920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 926 users visited in the last hour