Question: [Uniprot] The Protein Has Complete Sequence: Is It Always The Case?
4
gravatar for Pals
6.3 years ago by
Pals1.3k
Finland
Pals1.3k wrote:

I have made modeling and docking studies on a protein. According to entires in UniProt, the sequence of the protein is complete. However, I have a feeling that about 20-30 residues at its N-terminal region are missing. Because, its template as well as the structures that belong to the same family have the corresponding region and that portion seems to be catalytically important too (In other structures of this family, the residues play important role in fixing the ligand in right orientation). I wanted to verify it so I did blast search against nr database. And I got to know that there are no any sequences matching with the N-terminal most region of my protein. For example all of the homologous proteins start from 20-35 residues. In this case, can I propose that the protein sequence is incomplete?

modeling uniprot docking • 1.3k views
ADD COMMENTlink modified 13 months ago by Biostar ♦♦ 20 • written 6.3 years ago by Pals1.3k
2

Does your sequence come from SwissProt or TrEMBL?

ADD REPLYlink written 6.3 years ago by Michael Schubert6.7k
1

It came from TrEMBL (Q56917)

ADD REPLYlink written 6.3 years ago by Pals1.3k
1

The background of Michael's question almost certainly is that UniProt contains both well curated proteins (from SwissProt and PIR) and automatically translated nucleotide sequences (from trEMBL). The latter are much more likely to contain errors.

ADD REPLYlink written 6.3 years ago by Chris Evelo9.9k

It came from SwissProt.

ADD REPLYlink written 6.3 years ago by Pals1.3k

Yes, sorry, forgot to follow up on this.

ADD REPLYlink written 6.3 years ago by Michael Schubert6.7k
3
gravatar for Larry_Parnell
6.3 years ago by
Larry_Parnell15k
Boston, MA USA
Larry_Parnell15k wrote:

You certainly can! When I was analyzing gene models for Arabidopsis thaliana and human genome projects, this is precisely the kind of result that indicated an error in the gene model and hence in the conceptual translation into protein. In fact, most such errors were found at the N terminus, just as in your example. Without genomic sequence in hand, it may be difficult for you to model your protein - because you need to find a new exon 1 (maybe more because this gene model is very likely to extend farther in the 5' or upstream direction). One approach if that genomic sequence does not exist is to "borrow" the missing residues from the top BLASTP hit as a surrogate for the N terminus - for the purpose of modeling.

Please let me know if I should provide more details for finding the missing ~20-35 residues.

ADD COMMENTlink written 6.3 years ago by Larry_Parnell15k
1

Do you have access to any genome sequence data? This could be from your own data or from ESTs or a genome project on the organism you're studying? If so, take the N-term of the protein matching at 82% and use that as query in a search against those DNA sequences.

ADD REPLYlink written 6.3 years ago by Larry_Parnell15k

Thanks Larry!! The top BLASTP hit after itself is another protein that has 82% sequence identity. In that case, how can we be sure that those residues in N-terminal are the one that should be present in our protein. Of course, it could provide the secondary structure to make docking studies. And I am sure the residues that are catalytically significant are present in that protein too but if I will be unable to get the exact residues, I don't think it would make much sense.

ADD REPLYlink written 6.3 years ago by Pals1.3k

This protein is from yersinia enterocolitica. However, its genome has not been sequenced yet (its undergoing probably).

ADD REPLYlink written 6.3 years ago by Pals1.3k

I searched for that sequence but I did not get satisfying results because the strain Ye O:3 has not been sequence yet. However, I made a try in Ensemble.

ADD REPLYlink written 6.3 years ago by Pals1.3k
1
gravatar for Jerven
6.3 years ago by
Jerven610
Jerven610 wrote:

Yes the real protein as actually expressed does not always match the reported sequence in UniProtKB. This is especially true for sequences in the TrEMBL section. As the protein sequence prediction can be very bad. Which means care needs to be taken as you are doing now.

However, you can always request an update of the sequence and that its integrated into swiss-prot using the contact link on uniprot.org (top right in the blue bar). No guarantee that there is curator time available, but it never hurts too ask.

ADD COMMENTlink written 6.3 years ago by Jerven610

Yes, I have asked for an update. But its very unlikely that they will add the missing residues. Instead they might mark it as an incomplete sequence..:)

ADD REPLYlink written 6.3 years ago by Pals1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1162 users visited in the last hour