Is It Possible To Get Uniprot Ft Features Without Using The Available Flat Files
5
4
Entering edit mode
11.3 years ago

Hello,

I was interested in getting the FT features for a bunch of kinases. for instance for AKT1 I would get "FT DOMAIN 150 408 Protein kinase." from the file available at http://www.uniprot.org/uniprot/P31749.txt.

So I was wondering if parsing the dedicated Uniprot text annotation file is the only way to get this information. Or this information is also stored and available in a publicly accessible database.

Thanks in advance for you suggestion.

uniprot parsing mysql database • 5.8k views
4
Entering edit mode
11.3 years ago
Jerven ▴ 650

There are quite a few ways to get this information out of the uniprot website. Please write to help@uniprot.org

But for example this via the rest interface. I am out of the office and won't have time to write a complete answer until next week (12th of December 2011)

0
Entering edit mode

Very very nice !! I am in a hurry to be december 12th. I didn't know about this REST possibility

0
Entering edit mode

My colleague @Elisabeth_Gasteiger hopefully answered your question. @Pierre_Lindenbaum also gave a good answer.

4
Entering edit mode
11.3 years ago

Here is a the faq to using the REST interface: http://www.uniprot.org/faq/28

The batch service doesn't seem to let you output custom tab format. http://www.uniprot.org/batch/

Assuming you have a file of accession ids, you can use the entry get service using this python script:

import urllib,urllib2,sys

url = 'http://www.uniprot.org/uniprot?columns=id%2Cfeature%2Cdomain%2Cdomains&format=tab&query=accession%3A'

accFile = open(sys.argv[1],'r')

for line in accFile:
acc = line.strip()

response = urllib2.urlopen(url + acc)
response.close()

print results


save as yourName.py. Use by: python yourName.py accessionIDsList

This script will basically go through each accession id in the list, request the entry and display the feature, count of domains, and domain name in a tab delimited format. If you want to display other information, check out the REST service FAQ and add in your own columns in the url.

3
Entering edit mode
11.3 years ago

You could use the simple following XSLT file:


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
xmlns:u="http://uniprot.org/uniprot"
version='1.0'
>

<xsl:output method="text" encoding="UTF-8"/>
<xsl:param name="temporary">temporary</xsl:param>

<xsl:template match="/">
<xsl:apply-templates select="u:uniprot"/>
</xsl:template>

<xsl:template match="u:uniprot">
<xsl:apply-templates select="u:entry"/>
</xsl:template>

<xsl:template match="u:entry">
<xsl:variable name="name" select="u:name[1]"/>
<xsl:for-each select="u:feature">
<xsl:value-of select="\$name"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="@type"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="@description"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="@evidence"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="@status"/>
<xsl:text>    </xsl:text>
<xsl:apply-templates select="u:location"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>

<xsl:template match="u:location[u:begin and u:end]">
<xsl:value-of select="u:begin/@position"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="u:end/@position"/>
</xsl:template>

<xsl:template match="u:location[u:position]">
<xsl:value-of select="u:position/@position"/>
<xsl:text>    </xsl:text>
<xsl:value-of select="u:position/@position"/>
</xsl:template>

</xsl:stylesheet>


Example:

xsltproc --novalid stylesheet.xsl http://www.uniprot.org/uniprot/P31749.xml

AKT1_HUMAN    chain    RAC-alpha serine/threonine-protein kinase            1    480
AKT1_HUMAN    domain    PH            5    108
AKT1_HUMAN    domain    Protein kinase            150    408
AKT1_HUMAN    domain    AGC-kinase C-terminal            409    480
AKT1_HUMAN    nucleotide phosphate-binding region    ATP        by similarity    156    164
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            14    19
AKT1_HUMAN    region of interest    Inositol-(1,3,4,5)-tetrakisphosphate binding            23    25
AKT1_HUMAN    region of interest    Inhibitor binding            228    230
AKT1_HUMAN    active site    Proton acceptor        by similarity    274    274
AKT1_HUMAN    binding site    Inositol-(1,3,4,5)-tetrakisphosphate            53    53
(...)

0
Entering edit mode

Thanks a lot for sharing your xslt response using online xml text files.

2
Entering edit mode
11.3 years ago
Chris ▴ 190

You can download the flat file containing all Swiss-Prot proteins here [1]. To parse that file, I'd use sth like Biopython which makes it easy to retrieve the feature section of each protein.

0
Entering edit mode

Thanks chris but I am looking for an alternative way, as I said in my post.

2
Entering edit mode
11.3 years ago

You could also use the gff format (cf. http://biowiki.org/GffFormat)

examples:

single entry: http://www.uniprot.org/uniprot/P31749.gff

PS: To get a reply from the UniProt team, the best channel is to send an email to help@uniprot.org