How to map sub-cellular localisation to enteries in uniprot database fasta file.
1
1
Entering edit mode
7.0 years ago
wl284 ▴ 90

I have a dataset of proteins that I have blasted against the uniprot-swissprot database.

I'd now like to identify which proteins are likely to have a mitochondrial sub-cellular localisation based on the sub-cellular localisation of their best blast hit in the swiss-prot database.

The fasta headers of the uniprot proteins look like this:

">sp|Q64602|AADAT_RAT Kynurenine/alpha-aminoadipate aminotransferase, mitochondrial OS=Rattus norvegicus GN=Aadat PE=1 SV=1"

I have found a gene ontology mapping file (link below) but the fasta headers don't contain the GO IDs necessary to map them. ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/uniprotkb_sl2go

Is there some intermediate file that I need to use and does anyone know where to find it? Any help would be appreciated.

blast sequence • 1.9k views
ADD COMMENT
3
Entering edit mode
7.0 years ago

using xslt:

$ awk -F '|' '/^>/ {printf("%s\n",$2);}' input.fa | while read ACN ; do curl -s "https://www.uniprot.org/uniprot/${ACN}.xml"| xsltproc transform.xsl - ; done

Q64602  Mitochondrion

with transform.xsl:

ADD COMMENT
0
Entering edit mode

Thanks, that's awesome.

ADD REPLY
0
Entering edit mode

I changed the protocol to HTTPS, otherwise the response could be empty, because Uniprot move to https and sends a document moved header.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6