Question: Is It Possible To Get Uniprot Ft Features Without Using The Available Flat Files
4
gravatar for Fred Fleche
7.5 years ago by
Fred Fleche4.3k
Paris, France
Fred Fleche4.3k wrote:

Hello,

I was interested in getting the FT features for a bunch of kinases. for instance for AKT1 I would get "FT DOMAIN 150 408 Protein kinase." from the file available at http://www.uniprot.org/uniprot/P31749.txt.

So I was wondering if parsing the dedicated Uniprot text annotation file is the only way to get this information. Or this information is also stored and available in a publicly accessible database.

Thanks in advance for you suggestion.

database mysql uniprot parsing • 3.4k views
ADD COMMENTlink modified 5.6 years ago by Biostar ♦♦ 20 • written 7.5 years ago by Fred Fleche4.3k
4
gravatar for Jerven
7.5 years ago by
Jerven640
Jerven640 wrote:

There are quite a few ways to get this information out of the uniprot website. Please write to help@uniprot.org

But for example this via the rest interface. I am out of the office and won't have time to write a complete answer until next week (12th of December 2011)

ADD COMMENTlink written 7.5 years ago by Jerven640

Very very nice !! I am in a hurry to be december 12th. I didn't know about this REST possibility

ADD REPLYlink written 7.5 years ago by Fred Fleche4.3k

My colleague @Elisabeth_Gasteiger hopefully answered your question. @Pierre_Lindenbaum also gave a good answer.

ADD REPLYlink written 7.4 years ago by Jerven640
4
gravatar for Damian Kao
7.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

Here is a the faq to using the REST interface: http://www.uniprot.org/faq/28

The batch service doesn't seem to let you output custom tab format. http://www.uniprot.org/batch/

Assuming you have a file of accession ids, you can use the entry get service using this python script:

import urllib,urllib2,sys

url = 'http://www.uniprot.org/uniprot?columns=id%2Cfeature%2Cdomain%2Cdomains&format=tab&query=accession%3A'

accFile = open(sys.argv[1],'r')

for line in accFile:
    acc = line.strip()

    response = urllib2.urlopen(url + acc)
    results = response.read().strip().split('\n')[1]
    response.close()

    print results

save as yourName.py. Use by: python yourName.py accessionIDsList

This script will basically go through each accession id in the list, request the entry and display the feature, count of domains, and domain name in a tab delimited format. If you want to display other information, check out the REST service FAQ and add in your own columns in the url.

ADD COMMENTlink written 7.5 years ago by Damian Kao15k
3
gravatar for Pierre Lindenbaum
7.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

You could use the simple following XSLT file:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    xmlns:u="http://uniprot.org/uniprot"
    version='1.0'
    >

<xsl:output method="text" encoding="UTF-8"/>
<xsl:param name="temporary">temporary</xsl:param>

<xsl:template match="/">
  <xsl:apply-templates select="u:uniprot"/>
</xsl:template>

<xsl:template match="u:uniprot">
  <xsl:apply-templates select="u:entry"/>
</xsl:template>

<xsl:template match="u:entry">
  <xsl:variable name="name" select="u:name[1]"/>
  <xsl:for-each select="u:feature">
    <xsl:value-of select="$name"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@type"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@description"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@evidence"/>
    <xsl:text>    </xsl:text>
    <xsl:value-of select="@status"/>
    <xsl:text>    </xsl:text>
    <xsl:apply-templates select="u:location"/>
    <xsl:text>
</xsl:text>
  </xsl:for-each>
</xsl:template>

<xsl:template match="u:location[u:begin and u:end]">
  <xsl:value-of select="u:begin/@position"/>
  <xsl:text>    </xsl:text>
  <xsl:value-of select="u:end/@position"/>
</xsl:template>

<xsl:template match="u:location[u:position]">
  <xsl:value-of select="u:position/@position"/>
  <xsl:text>    </xsl:text>
  <xsl:value-of select="u:position/@position"/>
</xsl:template>

</xsl:stylesheet>

Example:

xsltproc --novalid stylesheet.xsl http://www.uniprot.org/uniprot/P31749.xml

AKT1_HUMAN    chain    RAC-alpha serine/threonine-protein kinase            1    480
AKT1_HUMAN    domain    PH            5    108
AKT1_HUMAN    domain    Protein kinase            150    408
AKT1_HUMAN    domain    AGC-kinase C-terminal            409    480
AKT1_HUMAN    nucleotide phosphate-binding region    ATP        by similarity    156    164
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            14    19
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            23    25
AKT1_HUMAN    region of interest    Inhibitor binding            228    230
AKT1_HUMAN    active site    Proton acceptor        by similarity    274    274
AKT1_HUMAN    binding site    Inositol-(1,3,4,5)-tetrakisphosphate            53    53
(...)
ADD COMMENTlink written 7.5 years ago by Pierre Lindenbaum120k

Thanks a lot for sharing your xslt response using online xml text files.

ADD REPLYlink written 7.5 years ago by Fred Fleche4.3k
2
gravatar for Chris
7.5 years ago by
Chris190
Munich
Chris190 wrote:

You can download the flat file containing all Swiss-Prot proteins here [1]. To parse that file, I'd use sth like Biopython which makes it easy to retrieve the feature section of each protein.

[1] ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz

ADD COMMENTlink written 7.5 years ago by Chris190

Thanks chris but I am looking for an alternative way, as I said in my post.

ADD REPLYlink written 7.5 years ago by Fred Fleche4.3k
2
gravatar for Elisabeth Gasteiger
7.5 years ago by
Geneva
Elisabeth Gasteiger1.6k wrote:

You could also use the gff format (cf. http://biowiki.org/GffFormat)

examples:

single entry: http://www.uniprot.org/uniprot/P31749.gff

query: http://www.uniprot.org/uniprot/?query=AKT1&sort=score&format=gff

(see http://www.uniprot.org/faq/28)

PS: To get a reply from the UniProt team, the best channel is to send an email to help@uniprot.org

ADD COMMENTlink written 7.5 years ago by Elisabeth Gasteiger1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 809 users visited in the last hour