Retrieving Uniprot Protein Isoform Sequences Programmatically?
3
7
Entering edit mode
12.2 years ago
Pablo Pareja ★ 1.6k

Hi everyone, I've been searching for a way to retrieve protein isoforms sequences programmatically but unfortunately didn't succeed so far. After parsing the whole Uniprot Trembl and Swissprot xml files I've seen that the sequences are not included, just the isoforms definition plus some extra information. Do you know if there's any kind of web service for it? Thanks in advance,

Pablo Pareja

uniprot xml isoform • 5.7k views
2
Entering edit mode
12.2 years ago
Jerven ▴ 650
http://www.uniprot.org/uniprot/<ANY_UNIPROT_ISOFORM_ID_THAT_YOU_HAVE>.fasta


replace the part in CAPITALS with isoform accesions that you want to find.

e.g.

http://www.uniprot.org/uniprot/P05067-2.fasta

This won't work for the cases where the isoform id is the canonical sequence and there is just one sequence in the record.

Should be okay week after next for these cases as well.

0
Entering edit mode

Hi jerven. Welcome to Biostars! Your first link is broken and needs some mending. Cheers!

0
Entering edit mode

Hi jerven. Thanks for your answer, I assume then that this url-pattern should work from mid-February on. I have to ask anyways, isn't there a way for retrieving more than just one isoform sequence at a time? e.g. generating a multi-fasta file with several isoform sequences included? Cheers

0
Entering edit mode

Yes this will work now but when you get back a 500 error try again without -d+ of the isoform to get the canonical sequence.

2
Entering edit mode
12.2 years ago

Check out this section of the FAQ. Note that although it mentions they can be downloaded here , the link actually goes to the documentation of Paul Kersy's varsplic.pl program which is usually found here

2
Entering edit mode
12.2 years ago
Pablo Pareja ★ 1.6k

I didn't realized so far that there's a file including every isoform sequence in Uniprot downloads site:

isoform fasta file

Maybe this is not the best option for some people willing to use a web service instead but in my case it fits my requirements perfectly. (Just with a really simple Java program I can extract all the information I need from any protein isoform.)

0
Entering edit mode

Heads up: in 2018 this file contains only non-canonical splice variants (not all isoforms). Canonical isoforms are stored in ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz which is an order of magnitude bigger.