Hi - this is a very simple question. I am trying to use polyphen to score snps in batch. I have polyphen installed on my computer, and the input file it takes is in the following form:
P18887 399 R Q
P18074 751 K Q
P01023 1000 I V
Q9BUG6 186 L V
P15848 358 V M
Q9UNQ9 110 V I
P35568 158 P R
P06241 445 I F
P11245 286 G E
P12259 1764 V M
P51168 594 T M
P16581 575 L F
P08908 273 G D
Q92889 706 I T
Q92889 875 E G
O75360 142 A T
P11532 557 I T
P00451 1260 D E
which I'm sure is all familiar to you as the protein ID, position, ref amino acid and subsituted amino acid.
However, when looking on dbsnp if I search for any gene I might have expected a file in this format for a given gene. I am very surprised that this type of file is not readily available. Anyway I wondered how I can produce such a file? I am familiar with python/R and biopython/bioconductor. but SNPs is new to me.
For example I see a table like this: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=3603
where the table contains two of the needed fields - the substitued amino acid, and the position. but not the protein id or the reference amino acid.
Thank you.
hi Ram - please see the link I have updated my post with. it contains a table I am working on from dbsnp. each snp id rsxxxxxxxxx appears to have an amino acid subsition (unless nonsense). So I am looking for the equivalent polyphen format of this table.
Hi, the contents of this table look like they're being computed on the fly, and are not available for download as a text file. You might wanna look for dbSNP remote querying features or use the genome browser.