Question: A simple question regarding getting snp info
gravatar for genetype2
5.9 years ago by
United Kingdom
genetype20 wrote:

Hi - this is a very simple question. I am trying to use polyphen to score snps in batch. I have polyphen installed on my computer, and the input file it takes is in the following form:


P18887    399    R    Q
P18074    751    K    Q
P01023    1000    I    V
Q9BUG6    186    L    V
P15848    358    V    M
Q9UNQ9    110    V    I
P35568    158    P    R
P06241    445    I    F
P11245    286    G    E
P12259    1764    V    M
P51168    594    T    M
P16581    575    L    F
P08908    273    G    D
Q92889    706    I    T
Q92889    875    E    G
O75360    142    A    T
P11532    557    I    T
P00451    1260    D    E


which I'm sure is all familiar to you as the protein ID, position, ref amino acid and subsituted amino acid.


However, when looking on dbsnp if I search for any gene I might have expected a file in this format for a given gene. I am very surprised that this type of file is not readily available. Anyway I wondered how I can produce such a file? I am familiar with python/R and biopython/bioconductor. but SNPs is new to me.


For example I see a table like this:

where the table contains two of the needed fields - the substitued amino acid, and the position. but not the protein id or the reference amino acid.


Thank you.

snp polyphen • 1.2k views
ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by genetype20
gravatar for RamRS
5.9 years ago by
Houston, TX
RamRS28k wrote:

IMO dbSNP stores SNPs, which are nucleotide changes. You are looking for a specific type of nucleotide change - a non-synonymous SNP in a coding region (an SNP that causes an AA mutation).

I use PolyPhen to predict possible effects of mutations in proteins I analyze at my lab. This would mean that I have a protein sequence, and I know the mutation I'm looking for. (Like R399Q in P18887). Running this info through PolyPhen then becomes understandable.

If you're looking to run PolyPhen on all coding SNPs for a protein, you might wanna check out UniProt (or any such well-annotated protein resource) to get your list. Else, you're looking at a custom intermediate step to convert nucleotide variant information to AA mutation information before you can get to PolyPhen.

ADD COMMENTlink written 5.9 years ago by RamRS28k

hi Ram - please see the link I have updated my post with. it contains a table I am working on from dbsnp. each snp id rsxxxxxxxxx appears to have an amino acid subsition (unless nonsense). So I am looking for the equivalent polyphen format of this table.

ADD REPLYlink written 5.9 years ago by genetype20

Hi, the contents of this table look like they're being computed on the fly, and are not available for download as a text file. You might wanna look for dbSNP remote querying features or use the genome browser.

ADD REPLYlink written 5.9 years ago by RamRS28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 711 users visited in the last hour