Question: Isoforms Missing In Uniprot Isoforms Fasta File
1
gravatar for Pablo Pareja
6.0 years ago by
Pablo Pareja1.6k
Granada, Spain
Pablo Pareja1.6k wrote:

Hi,

I just ran into a problem regarding isoforms specified in Uniprot XML files which are not present in the 'every' isoform fasta file available in Uniprot.

One example would be the protein: Q8IYH5 where we can find the following isoforms:

<comment type="alternative products">
<event type="alternative splicing"/>
<isoform>
<id>Q8IYH5-1</id>
<name>1</name>
<sequence type="displayed"/>
</isoform>
<isoform>
<id>Q8IYH5-2</id>
<name>2</name>
<sequence type="described" ref="VSP_025511"/>
</isoform>
<isoform>
<id>Q8IYH5-3</id>
<name>3</name>
<sequence type="described" ref="VSP_025509 VSP_025510"/>
<note>No experimental confirmation available.</note>
</isoform>
<isoform>
<id>Q8IYH5-4</id>
<name>4</name>
<sequence type="described" ref="VSP_025512 VSP_025513"/>
<note>No experimental confirmation available.</note>
</isoform>
</comment>

Most of them can be found in the general isoform fasta file, however the isoform with ID: Q8IYH5-1 is not present in the file.

Do you know if this is a bug/isolated case? Is there any reason/relationship between the ones missing here?

Thanks in advance,

Pablo

fasta xml protein uniprot isoform • 1.5k views
ADD COMMENTlink written 6.0 years ago by Pablo Pareja1.6k

Hi Pablo, the isoform fasta file (uniprotsprotvarsplic.fasta) only contains additional isoforms (see ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic) and not the "canonical" sequences, i.e. those marked <sequence type="displayed"/>

The isoform file needs to be combined with the file uniprotsprot.dat (or uniprotsprot.fasta) to obtain all isoform sequences.

See also What is the canonical sequence? http://www.uniprot.org/faq/30 How to retrieve sets of protein sequences? http://www.uniprot.org/faq/38

Elisabeth

ADD REPLYlink written 6.0 years ago by Elisabeth Gasteiger1.2k
6
gravatar for Elisabeth Gasteiger
6.0 years ago by
Geneva
Elisabeth Gasteiger1.2k wrote:

Sorry, this is my first post - I just posted my reply as a comment by mistake.

Here you go again:

The isoform fasta file (uniprot_sprot_varsplic.fasta) only contains additional isoforms (see ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/README.varsplic) and not the "canonical" sequences, i.e. those marked

<sequence type="displayed"/>

in the xml.

The isoform file needs to be combined with the file uniprot_sprot.dat (or uniprot_sprot.fasta) to obtain all isoform sequences.

See also

What is the canonical sequence? http://www.uniprot.org/faq/30

How to retrieve sets of protein sequences? http://www.uniprot.org/faq/38

Elisabeth

ADD COMMENTlink written 6.0 years ago by Elisabeth Gasteiger1.2k

Thanks for your quick answer, I have a few remarks about this though. Why not including information about the isoform like name and sequence in the XML files? Besides that, I just downloaded the file uniprot_sprot.fasta and there is no entry for the isoform Q8IYH5-1 here either. Should it be somewhere else?

ADD REPLYlink written 6.0 years ago by Pablo Pareja1.6k

Isoform names are in the xml: see e.g. <name>1</name> in your above example.

Sequences are not, I can check for you whether this has ever been discussed.

Q8IYH5-1 is the displayed (canonical) sequence of Q8IYH5, and its sequence is therefore identical to that of Q8IYH5. uniprot_sprot.fasta has an entry for all canonical isoforms, including Q8IYH5.

ADD REPLYlink written 6.0 years ago by Elisabeth Gasteiger1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1062 users visited in the last hour