get the snp sequence from dbSNP programally
1
0
Entering edit mode
6.9 years ago
Zhilong Jia ★ 2.0k

How to get the sequence of snp from dbSNP/ pubmed?

For example,

http://www.ncbi.nlm.nih.gov/snp/?term=rs1173745

get the sequence:

TATTGACAAGTTTAGATTTGGGGCT[C/T]ATATTTTTTCAGTTGGGACTCCTGT

(keep the [C/T]) Thank you.

 

SNP dbSNP • 2.4k views
ADD COMMENT
0
Entering edit mode

When I try to enter $ xsltproc stylesheet "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1173745,25&retmode=xml" I was meet a problem like this: warning: failed to load external entity "stylesheet" cannot parse stylesheet

Anyone can help to solve this problem? Thank you.

ADD REPLY
0
Entering edit mode

1) my answer is 4 years old, eutils has moved to HTTPS

2) "stylesheet" is, of course, the name of the xslt file.

ADD REPLY
5
Entering edit mode
6.9 years ago

you can get the fasta sequence with NCBI-efetch

~$ curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1173745&rettype=fasta"
>gnl|dbSNP|rs1173745 rs=1173745|pos=26|len=51|taxid=9606|mol="genomic"|class=snp|alleles="C/T"|build=142|suspect=?|GMAF=T:145:0.029
TATTGACAAG TTTAGATTTG GGGCT
Y
ATATTTTTTC AGTTGGGACT CCTGT

if you really want to keep the [C/T]  in the sequence. Use the following XSLT stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:S="http://www.ncbi.nlm.nih.gov/SNP/docsum" version="1.0">
  <xsl:output method="text"/>
  <xsl:template match="/">
    <xsl:apply-templates select="/S:ExchangeSet/S:Rs"/>
  </xsl:template>
  <xsl:template match="S:Rs">
    <xsl:text>&gt;rs</xsl:text>
    <xsl:value-of select="@rsId"/>
    <xsl:text>
</xsl:text>
    <xsl:apply-templates select="S:Sequence"/>
    <xsl:text>
</xsl:text>
  </xsl:template>
  <xsl:template match="S:Sequence">
    <xsl:value-of select="S:Seq5"/>
    <xsl:text>[</xsl:text>
    <xsl:value-of select="S:Observed"/>
    <xsl:text>]</xsl:text>
    <xsl:value-of select="S:Seq3"/>
  </xsl:template>
</xsl:stylesheet>

usage:

$ xsltproc stylesheet "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1173745,25&retmode=xml" 

>rs1173745
TATTGACAAGTTTAGATTTGGGGCT[C/T]ATATTTTTTCAGTTGGGACTCCTGT
>rs25
TCTGTGAGCTTCTGCATGCAATCCT[A/G]TGCAATTGGAATTTGATAGTCCTTT

 

 

 

 

 

ADD COMMENT
0
Entering edit mode

Pierre - is it possible to get the amino acid sequence instead?

ADD REPLY
0
Entering edit mode

a snp is a variation of a genomic (dna) sequence. if you just want the prediction , see tools like VEP, snpEff, etc...

ADD REPLY
0
Entering edit mode

I'm interested in using predicted secondary structure change due to SNPs, and it would seem like an easy way to obtain the amino acid sequence with the SNP already subbed in, rather than having to get the reference protein sequence and "parse in" the mutation myself.

ADD REPLY

Login before adding your answer.

Traffic: 1746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6