Question: Obtaining SNP locations on protein sequences.
0
gravatar for Bioaln
3.5 years ago by
Bioaln300
France
Bioaln300 wrote:

Hello. I'm a researcher in the field in proteomics. Currently, I'm trying to obtain SNP locations on protein sequences, along with protein sequences.

 

My question is> Which database/service currently hosts protein SNP location, where I can batch download data for every protein? Is this even possible?

 

Thank you very much.

retrieval snp protein sequence • 1.0k views
ADD COMMENTlink modified 3.4 years ago by Ibrahim Tanyalcin930 • written 3.5 years ago by Bioaln300
1
gravatar for Pierre Lindenbaum
3.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

this information is available in uniprot ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz

(...)
  <feature description="In dbSNP:rs11542705." id="VAR_048095" type="sequence variant">
    <original>M</original>
    <variation>I</variation>
    <location>
      <position position="155"/>
    </location>
  </feature>
(...)

 

ADD COMMENTlink written 3.5 years ago by Pierre Lindenbaum118k

Sorry for late reply, I tried to parse this with python, and I successfully get all of the variants. Is there any way I can obtain only nsSNPs? Or ones, connected with pathogenic effects?

ADD REPLYlink written 3.5 years ago by Bioaln300
0
gravatar for Ibrahim Tanyalcin
3.4 years ago by
Belgium
Ibrahim Tanyalcin930 wrote:

You can use a tool we have recently published at bioinformatics: I-PV (http://i-pv.org/). It will print your protein sequence along with SNVs, their polyphen and sift scores, Indels, aminoacid sequence, their chemical properties and corresponding codons. You will be able to see possible point mutations at each location and distribution of a set of amino acids to another set of amino acids. I have uploaded a set of introductory videos at the I-PV's website. Here is one of them: http://i-pv.org/intro_ipv_alt4.html

You will need the fasta files of your mRNA  (NM_...) and protein sequence. You will also need a text file of conservation scores separated by newline character. (You can upload a dummy conservation file of random numbers if you like). Lastly, you will need the variant file for your SNVs where you can download from Biomart (http://www.ensembl.org/biomart/martview/) for your protein of interest. Or alternatively you can use a vcf file.

The resulting image will be interactive and you can still plot/hide data on it using the highlight tool or drop down menus. To have an idea what the output looks like and whether if it fits what you want take a look at some examples:

http://i-pv.org/FOXP2.html

http://i-pv.org/MYOSIN2.html

I hope this helps,

Good luck with your research,

ADD COMMENTlink written 3.4 years ago by Ibrahim Tanyalcin930
1

Thanks for the answer, were those made with Circos?

ADD REPLYlink written 3.4 years ago by Bioaln300

Dear Bioaln,

The software is built on top of circos correct. It is a combination of circos and javascript. However, you do not have to generate datatracks yourself, they are automatically generated from the fasta files you provide. The output will open in a browser, and when you click on the SNPs, it will take you to the corresponding page for further information at dbSNP, like this example:

http://i-pv.org/gifs/snpToDbsnp.gif

I hope this helps,

ADD REPLYlink written 3.4 years ago by Ibrahim Tanyalcin930
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour