A simple question regarding getting snp info
1
0
Entering edit mode
9.6 years ago
genetype2 • 0

Hi - this is a very simple question. I am trying to use polyphen to score snps in batch. I have polyphen installed on my computer, and the input file it takes is in the following form:

P18887    399    R    Q
P18074    751    K    Q
P01023    1000    I    V
Q9BUG6    186    L    V
P15848    358    V    M
Q9UNQ9    110    V    I
P35568    158    P    R
P06241    445    I    F
P11245    286    G    E
P12259    1764    V    M
P51168    594    T    M
P16581    575    L    F
P08908    273    G    D
Q92889    706    I    T
Q92889    875    E    G
O75360    142    A    T
P11532    557    I    T
P00451    1260    D    E

which I'm sure is all familiar to you as the protein ID, position, ref amino acid and subsituted amino acid.

However, when looking on dbsnp if I search for any gene I might have expected a file in this format for a given gene. I am very surprised that this type of file is not readily available. Anyway I wondered how I can produce such a file? I am familiar with python/R and biopython/bioconductor. but SNPs is new to me.

For example I see a table like this: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=3603

where the table contains two of the needed fields - the substitued amino acid, and the position. but not the protein id or the reference amino acid.

Thank you.

polyphen SNP • 1.8k views
ADD COMMENT
0
Entering edit mode
9.6 years ago
Ram 43k

IMO dbSNP stores SNPs, which are nucleotide changes. You are looking for a specific type of nucleotide change - a non-synonymous SNP in a coding region (an SNP that causes an AA mutation).

I use PolyPhen to predict possible effects of mutations in proteins I analyze at my lab. This would mean that I have a protein sequence, and I know the mutation I'm looking for. (Like R399Q in P18887). Running this info through PolyPhen then becomes understandable.

If you're looking to run PolyPhen on all coding SNPs for a protein, you might wanna check out UniProt (or any such well-annotated protein resource) to get your list. Else, you're looking at a custom intermediate step to convert nucleotide variant information to AA mutation information before you can get to PolyPhen.

ADD COMMENT
0
Entering edit mode

hi Ram - please see the link I have updated my post with. it contains a table I am working on from dbsnp. each snp id rsxxxxxxxxx appears to have an amino acid subsition (unless nonsense). So I am looking for the equivalent polyphen format of this table.

ADD REPLY
0
Entering edit mode

Hi, the contents of this table look like they're being computed on the fly, and are not available for download as a text file. You might wanna look for dbSNP remote querying features or use the genome browser.

ADD REPLY

Login before adding your answer.

Traffic: 3252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6