Question

A simple question regarding getting snp info

0

Entering edit mode

9.6 years ago

genetype2 • 0

Hi - this is a very simple question. I am trying to use polyphen to score snps in batch. I have polyphen installed on my computer, and the input file it takes is in the following form:

P18887    399    R    Q
P18074    751    K    Q
P01023    1000    I    V
Q9BUG6    186    L    V
P15848    358    V    M
Q9UNQ9    110    V    I
P35568    158    P    R
P06241    445    I    F
P11245    286    G    E
P12259    1764    V    M
P51168    594    T    M
P16581    575    L    F
P08908    273    G    D
Q92889    706    I    T
Q92889    875    E    G
O75360    142    A    T
P11532    557    I    T
P00451    1260    D    E

which I'm sure is all familiar to you as the protein ID, position, ref amino acid and subsituted amino acid.

However, when looking on dbsnp if I search for any gene I might have expected a file in this format for a given gene. I am very surprised that this type of file is not readily available. Anyway I wondered how I can produce such a file? I am familiar with python/R and biopython/bioconductor. but SNPs is new to me.

For example I see a table like this: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=3603

where the table contains two of the needed fields - the substitued amino acid, and the position. but not the protein id or the reference amino acid.

Thank you.

polyphen SNP • 1.8k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by genetype2 • 0

Ram · Answer 1 · 2014-09-25

0

Entering edit mode

9.6 years ago

Ram 43k

IMO dbSNP stores SNPs, which are nucleotide changes. You are looking for a specific type of nucleotide change - a non-synonymous SNP in a coding region (an SNP that causes an AA mutation).

I use PolyPhen to predict possible effects of mutations in proteins I analyze at my lab. This would mean that I have a protein sequence, and I know the mutation I'm looking for. (Like R399Q in P18887). Running this info through PolyPhen then becomes understandable.

If you're looking to run PolyPhen on all coding SNPs for a protein, you might wanna check out UniProt (or any such well-annotated protein resource) to get your list. Else, you're looking at a custom intermediate step to convert nucleotide variant information to AA mutation information before you can get to PolyPhen.

ADD COMMENT • link 2.3 years ago by Ram 43k

0

Entering edit mode

hi Ram - please see the link I have updated my post with. it contains a table I am working on from dbsnp. each snp id rsxxxxxxxxx appears to have an amino acid subsition (unless nonsense). So I am looking for the equivalent polyphen format of this table.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by genetype2 • 0

0

Entering edit mode

Hi, the contents of this table look like they're being computed on the fly, and are not available for download as a text file. You might wanna look for dbSNP remote querying features or use the genome browser.

ADD REPLY • link 2.3 years ago by Ram 43k